Provenance as key technology to reduce information pollution

To verify the veracity of information, the authenticity and authoritativeness of that information is important to assess. Tracking the origins of information is important to understand its authenticity and authoritativeness. So far, authenticity and authoritativeness has largely been assessed only by understanding the content itself and has been the focus of media literacy programmes. However, information pollution has made it increasingly difficult for non-experts to assess the authenticity and authoritativeness of information, a problem that is likely to become worse. This document introduces the concepts and discusses existing efforts and finally outlines the commitment needed from governments.

Introduction

The term provenance is commonly used in technical literature to describe the practice of tracking who and what not only originates information, but who and what changes it. Conventionally, provenance is metadata that follows the information.

Consider an image, shot by a certain camera by a certain person. The camera information and an identifier of the photographer should in most cases be available as provenance metadata. Consider then the possibility that the image is photoshopped, then at least information that it has been photoshopped and by whom should be added. If the image is then published in social or editor-run media, the provenance information should be checked to see if it is likely consistent with the content of the image.

To allow citizens to assess the authenticity and authoritativeness of the information, there should usually be tools that can help them verify the correctness of the provenance metadata. Usually, the public should realize that information that comes with verifiable provenance metadata is more trustworthy than information without. However, this is clearly not desirable in all cases, in some cases that metadata is highly sensitive. In the case where the information comes from a whistleblower, it must be removed. In such cases, free editor-run media is important, as they can assess the veracity of the information, and the public would then need to rely on the reputation of the media to assess the trustworthiness of the information.

Recommended reading and existing initiatives

Henry Story has written quite extensively on the role of provenance in his work on Epistemology in the Cloud. In an OpenAI-sponsored paper titled Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations the authors describe the outcome of a symposium where potential mitigations were discussed. Provenance was discussed in the paper, noting that

Because technical detection of AI-generated text is challenging, an alternate approach is to build trust by exposing consumers to information about how a particular piece of content is created or changed.

They also reference a prominent technical approach by industry actors that has formed the Coalition for Content Provenance and Authenticity (C2PA), and has produced technical specifications including an harms modelling overview.

An earlier, and more academic standardization effort was undertaken under the auspices of the World Wide Web Consortium to produce the PROV family of documents. It did not see very extensive implementation, but I have used it to provide an unbroken chain of linked provenance data from individual changes to my own open source contributions through my packaging and to Debian packages. Unfortunately, the data sources this relied on are now defunct.

Nevertheless, it is clear that the main challenges are not technical. Conceptually, it is quite well understood and if it is done on a large scale, it could have a significant impact. It is therefore much more important to discuss the implications for how technology is governed.

Normative Technology Development

Whatever technology choices are made in this area will set down societal norms. This illustrates a crucial point that not only law sets norms in the form of regulation. There are also cultural, religious and biological norms, and also technological norms. As the C2PA coalition analyzed harms, they made choices as to what should be considered a potential harm, and in designing a system, those choices informed the technical decisions that they made, and thus formed norms. This is not to say that the choices that they made were wrong, but they are to a great extent political choices that should have been influenced by elected representatives.

Now, given that they sport significant implementation capacity, it makes sense that the companies that form this coalition to have a representation too, but it should have been balanced by other participants that have a democratic mandate. For these participants to have an impact, they must too have implementation capacity so that they can demonstrate that the solutions proposed by those with a democratic mandate can be implemented and have an impact.

Doing this in open ecosystems with non-exclusionary terms so that they become Digital Commons are essentially what I refer to as normative technology.

With a strong public engagement, democratic institutions will also gain knowledge into the details of the technology, which will also greatly assist lawmakers as eventually these norms may be written into law. Moreover, democratic institutions will need to assist in the dissemination of the technology throughout society, as this is probably the most difficult obstacle for provenance technology to have societal impact.

A vision of what good looks like

Too much of the public debate on technology is about the harms of social media and technology in general. We also have to say what we think technology should be like to be good. Many of these things must be discussed in great detail, but I just wanted to state in a few points what my current thinking is on this.

I think of four pillars of technology for a that are important for future societies:

  • Respect and empowerment of individuals
  • A public sphere with empowered collective action by communities
  • A sociotechnological landscape that is regulable
  • Commercial opportunities at any scale for everyone

Now, to detail these:

Respect and empowerment of individuals

  • Individuals, by virtue of being citizens of societies, are respected as a basic autonomous agent.
  • Individuals are unimpeded in their pursuit of knowledge as knowledge is the basis for their ability to act as an autonomous agent.
  • Individuals are empowered to create technology, literal and artistic works.
  • Individuals have the right to be connected to digital networks.
  • Individuals have the right to digital identities that can be used in online ecosystems.
  • Individuals have the right to basic data management facilities online.
  • Individuals have the right to possess devices in which they can place their ultimate trust.
  • Individuals have the right to information security.

A public sphere with empowered collective action by communities

People do not thrive in privacy alone, we are first and foremost social. Sometimes, they want to be left alone, but sometimes, they want to go in public and wear a smile. Sometimes, they need somebody’s support for their cries. Sometimes, they want to find a pleasant beach to relax, sometimes they want to find a mountain to challenge themselves. Sometimes, they want to discuss matters of importance to them, sometimes they want to organize an event for a select group of friends, sometimes for a large crowd. The public sphere must encompass all these human endeavors, i.e. they are “societal scale technologies”. Technologies developed only for democratic debate or only for gathering a group of friends are unlikely to succeed, the vision must be so universal it can host the entire public sphere.

  • Positive social relationships and enhanced social cohesion should be key design goals.
  • Democracies should develop technologies to enhance democratic processes.
  • Technologies should promote establishment of epistemic commons.
  • Citizens should have the possibility to interact with government services through digital interfaces but must not leave any citizen behind because they may not wish or be able to do so.
  • Communication between parties in a community must be secured.
  • Communities must have technologies to help establish trust between different parties. They must do so without compromising the ultimate trust individuals have in their devices.
  • Technologies should support the formation and sound interaction between groups and within groups that forms on a permanent or ad hoc basis for social interactions.

A sociotechnological landscape that is regulable

Centralisation of power by technology companies has led to a situation where democracies do not have sufficient knowledge to regulate technology development sufficiently.

  • Enabling individuals to unimpededly seek knowledge to open technologies can shift the power balance towards gaining regulability in democracies.
  • Democracies need to form governmental institutions to set norms through developing technologies for citizens using technologists in close collaboration with cross-disciplinary teams.
  • Democracies will develop new frameworks to guide detailed technological development.  
  • Through these institutions, open technology projects can be governed in collaboration with private companies in greater detail than today.
  • Technology acceptance is driven by individuals who find that virtuous cycles emerge with technologies designed with positive social relationships in mind.
  • Pluralism is a cornerstone, helped by decentralised architectures and that normative technology development institutions are commonly supported by national governments.
  • However, decentralisation should support societal goals, and is not a goal in itself.
  • Technology should support communities in regulating and sanctioning undesired behaviour and an effective and scalable manner, so that law enforcement isn’t overwhelmed. 

Commercial opportunities at any scale for everyone

  • Architectures should be designed so that individuals are empowered over providers of digital infrastructure.
  • Architectures should be designed so that the distribution of usage of different systems is gently sloped.
  • When citizens assume the role of consumer, they should be empowered to make their requirements clear, so that businesses can develop products with a clear market fit and reduce the need for marketing in the form of advertisements.
  • Payment systems should be standardised so that payments of even small size can be made over networks with low risk of fraud.
  • eCommerce should enable societies to transparently collect taxes to encourage development of society as a whole.

Målrettet reklame er skummelt selv om den ikke treffer

Sosiale medier har klart å få en ide om at de er så treffsikre til å befeste seg og at de derfor er verdt alle annonsekronene som brukes. Dette har store konsekvenser for hvordan samfunnet behandler og styrer teknologiutviklingen, og det er derfor farlig å tro for mye om spørsmålet. At annonsene føles treffsikre på den måten NRK rapporterte om kan skyldes en rekke psykologiske mekanismer. For å forstå dybden i problemstillingen, må vi se på en bredere samfunnsdebatt.

Etter presidentvalget i USA i 2016 ble det en debatt om Facebooks algoritmer kunne ha påvirket valgutfallet. Mark Zuckerberg avviste først tanken som absurd, men reverserte sin posisjon raskt. Hvorfor? Av et genuint ønske om å ta ansvar for undergravingen demokratiet, eller fordi mange gjorde han oppmerksom på at hele Facebooks eksistens er avhengig av en forretningsmodell som handler om å påvirke folk til å endre atferd slik at de kjøper produkter som averteres på Facebook og Instagram? Han kan ikke både ha det slik at Facebooks algoritmer ikke kan influere valg når de kjører politisk reklame men samtidig være effektivt når de samme algoritmene brukes til reklame for produkter.

At reklamen på sosiale medier er så effektiv ser ikke ut til å få mye støtte i forskning, både fordi det ikke er så mye forskning og fordi mye er basert på enkelthistorier. Jeg har min egen enkelthistorie fordi familiens dieselbil nylig skulle byttes ut med noe elektrisk og det var vanskelig å møte behovet. Derfor har jeg de siste par årene gjort meg aktiv på sosiale medier, på bilsider, ved å søke aktivt etter biler som tilfredsstilte kravene, kontakte forhandlere og produsenter, samt å bruke de strukturerte søkene jeg fant. Jeg passer vanligvis på å ikke legge igjen så mye data, men jeg gjorde bevisst unntak så det knapt nok er mulig å legge igjen mer data enn det jeg gjorde. 

Annonsesystemene fungerer slik at hver gang man ser en annonse skjer det en auksjon, flere algoritmer som samhandler for å selge annonseflaten til høystbydende, og man vet at bil er noe av det som oppnår de høyeste prisene i de auksjonene. Riktignok forsto alle annonseselgere raskt at jeg var på jakt etter ny bil, men de forsto ikke noe så banalt at bilen måtte være ladbar, at jeg bor i Norge burde være nok for algoritmene. Jeg visste i flere måneder hvilken bil jeg kom til å gå for, mens annonsen for den kom først da kontrakten var underskrevet. Så, den målrettede reklamen feilet i et tilfelle der den burde være mest profitabel og der den hadde fått massevis av data over flere år. Dette var ikke et vitenskapelig eksperiment, men det er ganske talende.

Det er mye som tyder på at dagens sosiale medier er skadelig. Det er sannsynlig at det er mulig med en viss manipulasjon av folks følelser på stor skala, og at visse aspekter ved algoritmene fører til lavere selv-følelse hos sårbare mennesker. Siden konspirasjonstenkning gjerne handler om å prøve å danne seg et fullstendig bilde av hvordan verden “virkelig” fungerer og man ser at mennesker med følelse av utenforskap gjerne finner sammen i flere miljøer, er det sannsynlig at sosiale medier fører til polarisering i sårbare grupper. Det er mulig at sosiale medier fører til at slike problemer eskalerer. Dette er sannsynligvis veldig skumle konsekvenser av dagens sosiale medier. I områder av verden der situasjonen er skjør kan dette være katastrofalt.

Vi har derfor en internasjonal debatt der utgangspunktet er at dagens sosiale medier nærmest har makt til å gjøre mennesker til viljeløse slaver. Jeg er redd man er i ferd med å gjøre en stor feil, for hvis reklamen ikke er så effektiv, så har det veldig mye å si for hvordan vi behandler teknologigigantene. Vi må blant annet forstå:

  1. Beveges folk mot ekstreme synspunkter fordi teknologien driver dem dit, eller oppstår det først og fremst fordi folk har legitime behov som ikke blir tilfredsstilt av samfunnet de lever i, og teknologien forsterker følelser som allerede er der?
  2. Er det sant som teknologigigantene svarer når EUs legger fram forslag til reguleringer at små og mellomstore bedrifter er avhengig av målrettet reklame?
  3. Hvor realistisk vi ser for oss at det er å erstatte dagens sosiale medier med noe som er bedre for folk og som kan bedre reflektere de normer vi kan bli enige om i demokratiet? Hvis reklamen virkelig ikke fungerer, er Facebook en boble som kan sprekke. 

Den anerkjente psykologen Martin Seligman forteller i boka “Flourish” om et besøk hos Facebook da det var ganske nytt og hvor mye godt han så for seg at de kunne gjøre. Det gikk virkelig ikke den riktige veien, men vi kan ta tilbake kontrollen!

Per i dag har vi ikke nok forskningsbasert kunnskap til å vite om målrettet reklame virkelig fungerer så bra, men det vi vet er at det uansett er skummelt nok til at vi er nødt til å skape noe bedre. Vi må bare ikke ukritisk tro på hypen hvis det gjør det vanskeligere enn det er!

Discussion on my RDF Hypermedia Proposal

I have had some offline discussions about my Read Write RDF Hypermedia proposal (please let me know if you’d like to be named), and there are some things that I’d like to elaborate on.

There is clearly a need to contrast my proposal with other technologies out there, and make it clearer where this makes sense.

Initially, I developed this with the ideal of making the RDF control statements look somewhat like natural language sentences, that is, if you as a client is authorized to delete a resource description, the RDF should tell you that in terms that are likely understood by someone who has never seen RDF before. Developers might just look at what they get from a server and start to work with it. After discussing it, I think it might have another advantage: It seems to me that we will have a lot of different protocols on different levels in the stack, and that this protocol heterogeneity could be addressed in part by moving controls into the body of the message. Lets discuss that first.

Protocol Heterogeneity

HTTP has served us well, and will continue to do so. Nevertheless, it seems to me that with “Internet of Things” (IoT), we will develop a “last mile” problem,  in which we cannot or do not want to control what application layer protocol to use. I haven’t done a lot of work in this space, but it seems very diverse. Obviously, if you can use a Wifi protocol, HTTP will available to you. I did some work with an Arduino (which is a microcontroller with very little RAM), where I found that the best choice to interact with it for my application to add an Ethernet shield and use HTTP. In other cases, it probably wouldn’t be. Some protocols are maintaining layered model, and some, like Zigbee, has gained an IP layer. Others, like DASH7, touches every layer in the OSI stack. In the latter case, it is mostly easy to map HTTP methods into operations on DASH7 devices, but the point here is that there is a whole zoo of protocols, some of which would need a mapping, some of which have an IP layer, but where you do not want to be tied to an application protocol for some reason. That is not to say that HTTP will not be the application protocol used on the open Web, but that there are endpoints that should be free to choose their protocol for interacting with their internals.

If the semantics of control operations sits in HTTP the translation may become unnecessarily complex, and so feel restrictive to implementors. That’s one of the reasons I would prefer to  have controls in the message body.

But it also implies that the controls should be as precise as possible. For example, I proposed

</test/08/data> hm:canBe hm:mergedInto .

and not that we encode POST as an operation. POST is not precise enough, but we should define the in vocabulary that for the HTTP protocol, hm:mergedInto would be implemented with a POST.

Finally, it should be noted that such controls should be defined for protocols that use very different operations. I have for example proposed similar controls for video:

<rtsp://camhost1.orienteering.org/stream> 
  a dcmit:MovingImage ;
  hm:can hm:play, hm:pause, hm:stop .

I also proposed more detailed more application-specific controls, like

<rtsp://camhost2.orienteering.org/stream> 
  a dcmit:MovingImage ;
  hm:canBe hma:votedFor ;
  rev:hasReview </user/foo/vote/1>, </user/bar/vote/56> .

where the definition of hma:votedFor might be a fairly complex, possibly defined in terms of other controls, I haven’t thought that far yet.

Contrasting With Other Efforts

There are several other vocabularies and protocols that are in the same space. None, as far as I know, take the same starting point: That controls should look and feel like natural language sentences. I think that is the most important point, that is the key enabler of making “View Source” development feasible for newcomers.

There are, however, in-band hypermedia controls to define allowed operations in RDF in the Hydra specification. However, Hydra doesn’t read like natural sentences, its appeal is more towards those who already know RDF. Moreover, it isn’t inclined to define precise controls, like hm:mergedInto, it allows you to formulate which operations to do, and then hydra:method is explicitly a HTTP method. As argued above, this may create a gap on the last mile, as the semantics becomes unclear further down the stack and it would require more work mapping between various protocols.

Then, there’s the Web Access Control (WAC) spec. On the surface, my proposal seems like a replacement for that, but that’s not intended, to the contrary, I see these as complimentary. In fact, when I first started my implementation, the first thing I did was to use the predecessor of this spec as well as WebID+TLS to create the authentication and authorization framework. I later removed that code to focus on the hypermedia aspects, but I still think WAC is the right way to specify the authorization rules. The WAC should be exposed so that clients that prefer to gain an instant overview of the application they are interacting with, but I think that many users will find it much easier to understand a nearly-natural language sentence that tells them exactly what they can do to the resource they are presently interacting with.

I guess I should have taken the complimentary perspective with Linked Data Platform as well. So far, I’ve been of the opinion that LDP should be superseded by hypermedia. My main point of criticism has been that LDP doesn’t have the right primitives, i.e. LDP’s primitives need to be understood in terms of HTTP, but that means a newcomer has to have incentives to read up. I think they need to have the “hey, I can do this”-experience first. The message and tools have to provide that. Then comes the argument from above that the emergent IoT-induced protocol heterogeneity problem could require us to formulate controls in message bodies.

That said, it is also pretty clear that my hypermedia proposal could be formulated to only add some RDF to resource descriptions, and thereby augment rather than replace LDP. So, I hereby apologize for my attacks on LDP, lets go complimentary.

Is the Controls Resource Needed?

In my first post, I argued that in the case where we expose a resource for reads for unauthenticated clients, but require authentication and authorization for writes, we need a separate resource to challenge the client. I admit that this is suboptimal, but I don’t see a lot of good alternatives. One alternative that was proposed to me is to add hypermedia controls to the response to an unauthenticated client as it is not the controls that require auth* but the right to use them. I feel that would be more confusing, as even an authenticated user may not be authorized to use certain controls, and so, a client may authenticate and then see certain controls disappear because they are not authorized to use them.

At this point, I think that implementation experience is needed to find the best approach, and importantly, study how inexperienced but enthusiastic implementors interact with resources.

Read Write RDF Hypermedia

The beauty of “View source”

When I first came to the Web in 1994, one of the first things I did was to “View source”. Although I had been programming since I was a kid, I didn’t come into it with “proper training”. As I looked at other people’s HTML, I felt the thrill of thinking “I understand this! I can actually make things on this Web thing!” Then, I went on to do that. Now, I also remember the introduction of CSS. I was skeptical. Should I learn another thing, when I had something that worked? Eventually, I was won over by the fact that could share styles between all my documents, that made life so much easier, I had accumulated a bunch of them at that point.

I was won over to RDF in 1998 too, but since then, I have never seen what I first saw in HTML, the “View source” and we’re ready to go. In fact, when I look at modern Web pages, I don’t see it with JS and HTML either. Something seems to have been lost. It doesn’t have to be that way. RDF triples can be seen as a simple sentence, it can be read and understood as easily as HTML was back in the day, if we just allow systems to do it. That’s what I’ve set out to do.

Hypermedia

I’ve been thinking about read-write RDF hypermedia for a long time, and I started implementation experimentation more than 5 years ago, but the solution I had in mind at the time where I wrote the above presentation didn’t work out. Meanwhile, I focused on other things. The backstory is a long one, I’ll save that for later, as I think I’m onto something now and I’d like to share that. I also have to admit that I haven’t been following what others have done in this space, which may or may not be a good thing.

Hypermedia, the idea that everything you need in an interaction to drive an application can be found in the messages that are passed, is an extremely good fit with my “View source” RDF. Now, the problem was to make it work with Linked Data, common HTTP gotchas, and yet make it protocol independent.

What I present now is what I think is the essence of the interaction. Writes necessarily has to be backed by authentication and authorization (and that’s what I started coding, then I found I should focus on the messaging first). For now, I have a pre-alpha with a hardcoded Basic Auth username and password, and it only does the simplest Linked Data. However, I think it should be possible to bring it up to parity with Linked Data Platform. That would be the goal anyway. The pre-alpha is up and running on http://rwhyp.kjernsmo.net/

One way to do Linked Data is to have a URI for a thing, which may be for example a person. A person isn’t a document, so it has a different URI from the data about it. For example, http://rwhyp.kjernsmo.net/test/08 could be my URI, and if you click it in a browser, you would get a flimsily formatted page with some data about me. With a different user agent like curl, you would have a 303 redirect to http://rwhyp.kjernsmo.net/test/08/data and that’s where the fun starts. This is what is intended to be where developers go.

If you do that, you will see stuff like (the prefixes and base http://rwhyp.kjernsmo.net are abbreviated for readability):

</test/08> a foaf:Person ;
 foaf:name "Kjetil Kjernsmo" ;
 foaf:mbox <mailto:kjetil@kjernsmo.net> .
</test/08/data> hm:toEditGoTo </test/08/controls> ;
 void:inDataset <http://rwhyp.kjernsmo.net/#dataset-test> .

In there, you find the same information about, augmented with hm:toEditGoTo standing between the URL of the data, and something that looks similar, but has a controls at the end. The idea is that it should be fairly obvious that if you want to edit the resource, you should go there and see what it says. For machines, the definition of the hm:toEditGoTo predicate should also be clear. Note that the choices of having data and controls in my URIs are not insignificant, you can have anything, as long as you use that predicate to link them.

If you do, you will be challenged to provide username and password. Try it out with testuser and sikrit. Then, you’ll see something like this:

</test/08/controls> hm:for </test/08/data> ;
 a hm:AffordancesDocument ;
 rdfs:comment "This document describes what you can do in terms of write operations on http://rwhyp.kjernsmo.net/test/08/data"@en .
</test/08/data> hm:canBe hm:replaced, hm:deleted, hm:mergedInto .

See, the last sentence tells you that this data document can be replaced and deleted, and merged into. Now, the idea is to define what these operations in the context of a certain protocol with an RDF vocabulary. I haven’t done that yet, but perhaps we can guess that in HTTP replaced means using PUT, deleted means using DELETE and to merge into means POST?

Then, it is the idea that these operations can be described in the message itself too. For example, we can say that if you use HTTP for hm:mergedInto, you have to use POST, and we can reference the spec that contains the details of the actual merge operation, like this:

hm:mergedInto hm:httpMethod "POST" ;
 rdfs:comment "Perform an RDF merge of payload into resource"@en ;
 rdfs:seeAlso [ 
   rdfs:isDefinedBy <http://www.w3.org/TR/rdf-mt/#graphdefs> ;
   rdfs:label "RDF Merge" ] .

It is an open question how much of this should go into what messages.

At this point, I use RESTClient but there are many similar tools that can be used, just use the same credentials, set the Content-Type to text/turtle, and POST something like

<http://rwhyp.kjernsmo.net/test/08> foaf:nick "KKjernsmo" .

to http://rwhyp.kjernsmo.net/test/08/data then get it another time, and see it has been added.

So, it is fairly straightforward. It is basically the way it has always been done. So, what’s new? The new thing is that a beginner has had their hand held through the process, and once we have the vocabulary that tells man and machine alike how to use a specific protocol, they can both perform their tasks based on very little prior knowledge.

At some point, we may want to hook this into the rest of the Semantic Web, and use automated means to execute certain tasks, but for now, I think it will be very useful for programmers to just look at the message and write for that.

Main pain point

The main pain point in this process was that clients aren’t supplying credentials without being challenged. My initial design postulated that they would (there’s nothing in the standard that discourages it), and so, I could simply include the hm:canBe statements with the data. After thinking about it for some time, I decided to take a detour around a separate document, which would challenge the client for reads and writes alike, and that the data document would only challenge for writes.

There are obviously other ways to do this, like using an HTTP OPTIONS method, but I felt it would be harder for a user to understand that from a message.

Feedback wanted

Please do play with it! I don’t mind if you break things, I’ll reload the original test data now and then anyway. There are some other examples to play with if you look for void:exampleResource at the root document.

I realize many cannot be bothered to create an account to comment, and I haven’t gotten around to configure better login methods on my site, so please contact me by email. I also hang out on the #swig channel on Freenode IRC as KjetilK, and I will be posting this to a mailing list soon too.

BTW, the code is on CPAN. I have had the Linked Data library for 8 years, there are some minor updates there, but most is in a new addon module. Now, I have to admit, all these additions have exposed the architectural weaknesses of the core library, I need to refactor that quite a lot to achieve LDP parity.

The Greatest Achievement in Social Media

If we managed to create a decentralized social media ecosystem, how would we go about to identify the hardest problems to tackle, and what would be our greatest achievement if we succeeded? If this seems like an odd question, bear with me, dear reader: Many technologists look are motivated by great, technical challenges, and this is an attempt to channel that energy into social problems.

Many people, who I would consider relatively like minded as myself would say that things like censorship resistance and anonymity are the absolute requirements, and so crown achievements. I do think they are important, but only within a broader, social context that takes into account a wide variety of social problems. We have to explore the borders to understand where this may fail to bring social benefit, and we have to consider other options in those cases.

I think it is very important to think about future, decentralized social media not as an application, not like another Facebook, but as an ecosystem, where social interactions is a common ingredient of many interconnected applications contributed by many different actors.

In an earlier post that I wrote in Norwegian, I mentioned the revenge porn problem, where media is put into a sexual context and distributed without the depicted person’s consent. Another problem in the same space is “grooming”, where a person is manipulated into sexual abuse.

Grooming often follows a pattern, where an older person contacts a minor, lying about their age and/or gender and has the minor send them relatively innocuous pictures based on some false pretense. With those pictures, the perpetrator then threatens to expose those pictures to classmates, parents or others to put the minor into a coercive situation, where abuse and coercion can only escalate.

It is clear that one should never enter such a situation with an anonymous peer. However, it is equally clear that one should not enter such a situation with a peer that knows your complete identity either, as that can result in more direct forms of sexual abuse. The grooming problem is a problem because there exists no reasonable and commonly used middle ground, and therefore people resort to unsafe channels. Most of these cases can probably be prevented if people had a strong, online identity that could be used to pseudonymize them using selective attribute disclosure and verifiable claims. With the former, the two peers can disclose only relevant and non-compromising information, for example age and gender (even though that too can be problematic, technology should also be developed to assist in ensuring that their full identity cannot be compromised). With verifiable claims, both peers can verify that the information disclosed by the other is accurate. They should be empowered by the social media platform to enter a context where they have this kind of pseudonymity, where they get the extra security. If, for example a teen enters a dating site, they will use their strong, verified online identity, but the security system of the dating site will see to that nothing that can compromise the identity is exchanged unintentionally. If the peers eventually choose to exchange further details or meet in real life, the peers should be able to indicate to the dating site that they have chosen to do so, and if the meeting results in abuse, this information can be passed to authorities.

“Revenge porn” is a much harder problem. The name itself is problematic, as for example artistic or simple nudes, indeed almost anything, may not have had any sexual intentions, but may be twisted into sexual context by a perpetrator. Moreover, the distribution of such media may not be for revenge, but still be felt as an abuse by the depicted. This underlines that it is never OK to blame the victim and that the problem is  much  broader than it seems at first sight. A fairly authoritarian approach may be advocated: One may argue that people cannot retain full control of their own devices, so that authorities may have the option to delete offending material. Censorship may be advocated, so that material will not propagate. Longer prison sentences may be advocated. I am opposed to all of these solutions, as they are simplistic and fail to address other valid concerns. Taking away people’s control of their own devices contribute to alienation and distrust in the social relevance of technology, something we need to rely on for the solution to the grooming problem above, but also many other problems. I am also opposed to prison sentences, it is often a breeding ground for more crime, and should be reserved for extreme cases.

We should be able to fix (in the sense that the problem is marginalized and made socially unacceptable) this without resorting to such authoritarian measures. It takes changing culture, and while there’s no technological quick fix to changing culture, technology and design can contribute. The Cyber Civil Rights Initiative is a campaign to end non-consensual porn, and has published a research report with many interesting findings. The group advocate some of the more authoritarian solutions, and while I am sympathetic to the need for legislation, I believe this should be considered a privacy problem, and dealt with in generic privacy legislation, as I believe is the case in Europe. Without having personal experience, I suspect that privacy violations where private media are stolen and exposed even without any sexual undertones can be felt as much of a violation is “revenge porn”, and they should therefore be dealt with similarly.

Page 22 of the report summarizes what kind of sanctions would have stopped the perpetrators, in their own words. It is understandable that legislative measures are forwarded, as those comes out as the most significant. I nevertheless think it is important to note that 42% said that they wouldn’t have shared abusive media “if I had taken more time to think about what I was doing”, and 40% “if I knew how much it would hurt the person”.  These are very important numbers, and things that can form the basis for design and cultural change. It is now possible to detect pictures with certain content with a relatively high probability, and make the poster think more carefully. Make them answer some questions. We could build a technology that asks “do you know how much this could hurt?”, and then a culture were friends ask the same. This becomes even easier if the victim is identified, as is not uncommon. In that case, the media could be tagged as “not OK to distribute”, and where friends of the victim could also appeal to the perpetrator’s conscience and also participate in stemming the distribution. Laws are for the 20% who said that nothing would have stopped them, and building a culture should also shrink this number significantly. Finally, 31% wouldn’t have posted the media “if I had to reveal my true identity (full name)”. Even without full name, a pseudonymized identity, like the one discussed above, could act as a significant deterrent, and would also help propagate warnings about how further distribution would be inappropriate and/or illegal.

This makes me hopeful that this is a problem were a well designed decentralized and rather censorship-resistant social media ecosystem could have a meaningful impact.

Another reason that the system has to be censorship resistant, is that it the same ecosystem has to be universal, for example, it must also function as a platform for public debate under authoritarian regimes. I would like to point out the work of Hossein Derakhshan who initiated a large blogosphere in Iran that contributed to a more open public debate. Derakhshan was arrested for his blogging in 2008, and released in 2014. He wrote a retrospective analysis of the development in the intervening years that is important well outside of Iran, called “The Web We Have to Save“. I  have great respect and admiration for the work that Derakhshan has done, and it underscores the importance of having a Web that can be the foundation for situations where it is important to stop the spread of certain material and for situations where it is important to keep it flowing.

To achieve this, we must be very careful with the design of the ecosystem. For example, the trusted identity is straightforward to achieve in Norway, where we have good reason to trust government, but it would be counter to the goal of an open debate and therefore stifling to do so in Iran. Trust in the identity is an example of something that must be built in very different ways around the world, and the ability for local programmers to integrate different approaches into the same ecosystem is therefore instrumental to the possibility of making it work for everyone.

It is undeniable that there is a tension between the above goals, but I think it is easy to agree they are both important. To have the same platform do both of these things is a great technological challenge, and I think that if we can do this, it will be a very important achievement for all of mankind.

Enabling Diversity Through Decentralised Social Media

Awareness of the problematic aspects of social media has been brewing for a long time, but recently, it has exploded in the public debate, with the effects of “fake news” and “echo chambers”, and also on the editorial role of Facebook. The response of Facebook has not impressed commentators. To make my position clear from the start: This is a very complex problem to solve and involve social, economical and psychological problems that need to be addressed. However, I wish to make the case that the technology also matters, and an approach must necessarily begin with changing the underlying technology. In this blog post, I would like to list the primary social problems that I have personally identified. However, I’ll try to take a positive angle, this is not just about Facebook and that failings, this is what we should create for the future. Out of the social problems, the echo chamber problem is only one.

The Echo Chamber Problem

My personal history with this problem doesn’t start here, it started 20 years ago, as I took on managing the Web site of the Norwegian Skeptics Society. The subjects we were concerned with were fairly marginal, and easy to label superstition. Astrology, UFOs, etc. It soon became quite apparent that the problem wasn’t to reach out with information, the actual problem was much, much harder, it was to get that information into closed minds. Since then, this problem has grown from being a marginal problem considered only by a few, to a major social problem.

Now, some may say that the problem is not technological, and possibly that technology is indifferent to the problem, any technological platform can help get information into closed minds, and any technological platform can provide infrastructure for opposing viewpoints and enable a debate between them. I disagree, and I think recent events make that clear. Even if technology alone cannot open closed minds, there are technological choices that are critical in enabling the needed platform for an open public debate.

First, it is important to constrain the scope of the technological problem, by understanding the problems that are of different origin. The reason why fake news thrives on Facebook is complicated and this article argues it comes down to the emotions of the people in your social network. This article contains a discussion of why “popping the bubbles” is problematic. It is also important to be reminded of the effects of identity protective cognition. Facebook has themselves published research on the issue. What is interesting is that nowadays, my Facebook feed is full of anti-Facebook sentiments, but none of these articles showed up there. I had to go look for them, only after I started to share these articles in my News Feed myself, similar articles started to surface. Now, the “echo chamber” and “filter bubble” metaphors may not reflect the true nature of the problem, but Facebook arguing that they are not doing so bad is mainly due to the lack of a ambitious baseline, we don’t know yet what could be achieved if a structured, cross-disciplinary approach was made. Even if the most important effects are social and psychological, if information isn’t available it can certainly not be acted upon.

To further understand the problem, we should listen to Facebook’s Patrick Walker, who responded to the “Terror of War” photo removal with a keynote given to Norwegian editors. The keynote is well worth watching, not just because it provides insight into the core problems, but also because it provides hints to the road ahead.

Patrick Walker himself gives an excellent summary of the accusations they face:

“[…] people said that we weren’t transparent, there’s no way of knowing what our algorithms do, that publishers don’t get any warnings about changes to the news feed, that we refuse to accept that we were editors and were abdicating our responsibility to the public, that we were trying to muzzle in on the media business and eroding the fundamentals of journalism and the business models that support it. That by encouraging editors to optimize for clicks and showing people what they want to see, we were creating filter bubbles and echo chambers.”

He then goes on to forward the values of Facebook, to give people the power to share and to make the world more open and connected. I have no reason to doubt that they are actually trying to do that. In fact, they may often be succeeding. However, the Web is what to a greater extent gives people the power to share. The Web is what truly empowers people, Facebook included, Facebook is merely a thin layer on the top of the Web. The problem is that it is truly a layer. If Facebook was just yet another possible social network, with a limited scope just like Mark Zuckerberg proposed, that’d be fine:

But it isn’t, it wields much more power than that, since it has effectively put the function of a social network into its own hands and controls that. Patrick Walker then goes on to describe how difficult it is to create global community standards for Facebook, and how problematic it would be if these standards did not apply to all of Facebook. He then concludes that people are free to say whatever they want elsewhere, but not on Facebook. This part of the talk is very compelling, and calls into question the appropriateness of calls for national or EU-mandated regulations. But it also makes it clear that Facebook cannot play the role of a public debate platform, he said that pretty much himself. In the moment an opinion becomes unpleasant, and those are the opinions that need the most protection, it has no place on Facebook. He says it clearly: “Facebook is not the Internet”. It makes it clear, to solve most of the problem he mentioned we have to create that alternative to Facebook that provides the platform for an open, public debate.

It is also clear from his talk that many of the problems that Facebook faces are due to that Facebook needs a single set of rules, and while he has made a good case for that for Facebook, it doesn’t have to be that way on the Internet. In fact, the architecture of the World Wide Web is decentralised, there is no single actor, such as Facebook that should control such an important feature as a social network. Decentralising the social network will have the effect of allowing a plurality of standards. Facebook only has to solve Facebook’s problems, these problems are not universal, and that’s why I say that the underlying technology matters. A decentralised social network has a different set of problems, some are difficult, but it is my clear opinion that the hardest problems are created by Facebook, and can be solved as decentralisation enables diversity and pluralism.

The General Lack of Diversity

As Walker noted, they have been accused of attacking the ability of the press to perform quality journalism. There is some merit to the argument, even if it was easy to predict that social media would be the most important distribution channel more than 15 years ago, before the social networks grew large and strong. Now, the press has to choose between letting Facebook be the editor-in-chief and hopefully a benevolent provider of a working business model, or to maintain their autonomy, and essentially starting over and figure out how to make money in social media on their own.

The problem is not just about Facebook or the press. Recently, Elon Musk said on their carpooling efforts that it “wasn’t Tesla vs. Uber, it is the people vs. Uber”, implying that Uber is a monopoly problem waiting to happen.

The centralisation is not only a problem for opinions in the public debate and  business models, though both are important aspects. It creates difficulties for permissionless innovation, an aspect central to the Web and the reason why Facebook could succeed. It limits the research that can be done on the platform, for example, no one else could have done the article we referenced, which places the owner of the social network in an ethically problematic privileged position.

The General Lack of Universality

With diversity challenged, another key aspect of the Web is also challenged: Its universality. Not everyone has a place on Facebook. The obvious ones that are excluded are pre-teen children. Not that they seem to care. In social networks, they certainly have a place. Moreover, people with cognitive disabilities will find the environment on Facebook very hostile, where they can be fooled into helping the spread of fake news and other material, also material that Facebook legitimately may delete. To some, much damage can be done before appropriate action can be taken. Moreover, their friends are not empowered to help. That’s not what the social network should have been, what I first had in mind was to port the social cohesion of real life to the Web, but the opposite has happened. This is a great failure, but at least a problem centralised systems could solve if they wanted to.

Combating Abuse

It gets even harder once we get to the problems surrounding revenge porn and “grooming”. I want to make clear that Facebook is not the target for this criticism, I’m talking more generally here. The problem is severe, but a problem that has just a few victims, and I believe that it cannot be solved if one is thinking only in terms of commercially viable systems. The technical contributions towards solving this problem is something I think needs to be government funded. Decentralisation is not necessarily helpful technologically, but standards and adoption of once approach could make a large impact. I think it is critical to addressing this problem that we enable trust models to work on the Web and that people are enabled to look out for each other.

Respect for the Individual

Finally, and this is a key problem for the future as well as the present, is the respect for the rights of individual citizens. We are moving towards an Internet of Things, where all kinds of sensors will provide lots of data, often connected to persons, and mining them can yield information that each citizen would certainly consider highly sensitive. I believe we cannot simply go on, neither in research nor in business technology, and pretend these problems are not significant, or that they are simply Somebody Else’s Problem. I reject the notion that the majority doesn’t care, they care, but they are left powerless to remedy the problem.  I hold it as a moral imperative to empower people, and we, as technologists have to tackle this problem.

I fear that we may have an anti-vax or anti-GMO type backlash if we do not commit to a privacy-aware infrastructure, so even if the moral imperative isn’t accepted, one should take this possibility seriously.

A Different World is Possible

I have already pointed out that decentralisation is an important technological enabler for a different world, and stated that this new infrastructure must be privacy aware. Obviously, neither the motivation nor the solution of a decentralised social network is new, it has been tried for a long time. So, how can we possibly succeed now? I think several things have changed: Most importantly, it is now understood that this is a problem that have large effects for entire societies. Secondly, we have standards, and we have learned much from the past, both in terms of technological mistakes and from research.

Now is time for action. We need a broad understanding of the problems, but we also need to start with the fundamental technological infrastructure that we need to provide to the world, and try out several possible approaches, approaches that can further the understanding of the cross-disciplinary solutions. I hope to be able to contribute in this space.