information retrieval – Business Networks

About two years ago, I asked Matt Mullenweg a question. Basically, I suggested WordPress ought to recognize its role as an open source search engine, and to realize that this is something the world actually desperately needs. There’s a video of the Q&A event in Paris on the homepage of wordpress.tv, and I created a deep link to my particular question here.

To summarize, Matt’s answer was excellent. He addressed all of the main relevant issues with great technical expertise. And he also addressed one of the main critical shortcomings – and so that issue is something I want to address here.

Matt acknowledged that “You could have sites that tie together better than they do today”.

In the following, I want to compartmentalize various aspects of the WordPress community. By and large, WordPress consists of humans and technology – and also the interactions between some humans and other humans, between humans and technology, and also the interactions between some kinds of technology and other technologies. All in all, it really is a quite complex system, but at first glance – taking a wide scope, or a so-called “bird’s eye view” – one can identify a sort of triangle, with each corner representing different “players” in a game of interaction:

Authors and such
the WordPress organization and such
Readers, Audiences, and such

Obviously, this is a vast oversimplification, because these groups are actually not completely distinct from one another. For example, when an internet user (normally considered to be a “reader” or “audience” participant) types in a URL or uses their mouse to move the cursor or to click on something, they are actually sending information rather than receiving information.

Yet the reason why I consider these groups useful is because they embody what I consider to be quite clear information asymmetries within the WordPress community:

I apologize for this diagram – that’s why I call it “Completely Ridiculous and Cringeworthy”. If it doesn’t make you cringe, then congratulations! 😉

Let me explain. It’s basically just a bunch of strings that flew into my head, having to do with interactions within the WordPress community. They are arranged in a way that might indicate something about the types of interactions that take place. So if a creator takes out their smartphone and shoots a picture, and then posts that picture to their blog, then that might qualify as a “tech” (hardware) interaction. Yet if the same person uses their smartphone to open a browser window, then that technology is being used to consume information. In both of these cases, WordPress might also be involved. When WordPress delivers html to a browser, then that would be a “user”-oriented interaction. When WordPress reads information from a database, that would generally be an interaction with (something like) an “author”.

The crux of the matter becomes more obvious when I point out that even though WordPress software is open source, information asymmetries continue to exist among these three subgroups – perhaps in many ways, but at any rate in one very crucial way: the amount of data available to each subgroup.

In the remainder of this blog post, I wish to focus on one particular type of data that is particularly unevenly distributed: the semantic organization of blogs via “tag” and “category” data.

Before I can underscore how valuable such information is to what Tim Berners-Lee once envisioned as a “semantic web”, I need to point out that the semantic organization of a blog is not an issue related to a single blog post. Instead, it is only when someone is able to survey all of the blog posts that each individual blog post becomes meaningful – much in the same way as the words uttered in a sentence become meaningful if and only if the entire sentence can be considered meaningful. Another good example of this phenomenon is almost any classification system. The conundrum of whether particular life forms ought to be classified as plants or animals only makes sense for classifying life forms, not for rocks, (inorganic) chemical compounds or elements (which are all considered to be non-living). In all of these cases, the meaning of something is at least in part derived from what it is not, there is a dichotomy of content vs. context, the optical “illusion” of foreground vs. background (commonly known from the vase vs. two faces silhouette image) only makes sense when contrasting black vs. white.

Hence, whether something is categorized as “business” makes sense insofar as it isn’t (or perhaps is also) categorized as “politics”, or maybe “health” becomes particularly meaningful if and only if it is clearly stated that “health” is a “science” and that all instances of the semantic category “science” actually refer to “sciences excluding health”. (for those interested to learn more about the science of thesaurus design and/or vocabulary control [both subsets of the field of information science], such annotations have been traditionally referred to as “scope notes” 😉 )

What does all of this have to do with the purported information asymmetry?

I’m glad I asked! 😀

Without being able to view all of the category information and/or all of the tag information, a user or reader cannot really make a reasonable estimate of what this particular blog post is about. If a particular blog post is tagged “fake news”, does that mean it is not about “propaganda”? If another particular blog post is tagged “spam”, does that mean it is also about “advertising”? The way WordPress works now, the author always knows, the user never (or at least extremely rarely*) knows, and the WordPress organization sometimes knows.

There are valid reasons for such asymmetries. As Esther Dyson pointed out well over a decade ago, the price of copying data is basically nil. If someone made all the data in their database freely available, then this data would exist in innumerable copies at the drop of a hat. Yet even though providing carte blanche access to everything may seem risky, bots like Google will almost certainly pore over anything and everything remotely available in order to make a pretty penny off of it anyways.

Google is not your friend. Facebook is not your friend. There is a long list of companies and actors out there, none of whom is your friend. Does the WordPress community need to move forwards towards establishing and promoting friendly relationships? I think it should, and I even have a concrete suggestion about how to do it.

I think WordPress should establish something like an open marketplace for ideas. Now this is very complicated, very abstract, not yet very clear and also not yet completely worked out, so please bear with me.

First of all, there should be very low barriers to entry – think “free” (but it would actually not be 100% free, as there’s no such thing).

Second, it would be – at least potentially – low risk. Market participants would basically be allowed to introduce themselves to each other. “OMG!” I hear you say: “NO WAY!! I get enough robocalls and other spam already!” Just a minute – remember I said it would not be 100% free. In order to introduce themselves (i.e., their blog), participants would need to lay bare their entire list of categories and tags – metaphorically speaking, they would be required to drop their pants and expose themselves. Perhaps there might also be an opportunity for a third party to verify that the information provided is not spoofed in some way, but even without verification it seems quite obvious that the recipient of such an introduction could quickly and easily ascertain whether there is any chance at all that such a “semantic friendship” might make sense (or whether on the other hand it seems too risky).

If the recipient were to decide to accept the offer of friendship, then both participants would be able to view each others’ current semantic organization and also to receive updates regarding changes in the semantic organization of each others’ websites… – but wait: there’s even more!

I think it would be very fortuitous if such friends could declare which semantic categories they especially agree on. For example, perhaps friends do not particularly agree on “health” topics, but they pretty much completely agree when the topic is “sports”.

Let’s imagine that one market participant has 100 such friends, and that they are in agreement with 5 of them regarding the topic “business”, and with 1 of these 5 they are also in agreement regarding “politics”. Now consider what could happen if some user visits this participant’s website and views an article (or blog post) about business. Imagine how useful it could be for that user to be able to see that there are 5 other blogs that are related, 1 of which is even also related from another perspective. Now also imagine an adjustable slider were available, offering new and different perspectives – linking to other categories of information, which are also available (and perhaps even also linked to / aligned with “business”) from those 5 friends who agree with respect to the “business” category. In this way, the categories of agreement could function as springboards to other categories which are not even on the map for the current blog… thereby providing additional, new perspectives related to the current topic. For example, if another group of friends agree with the current blog on the topic “real estate”, then if the current blog’s focus is New York, other blogs may offer unique perspectives from Los Angeles, San Francisco, Miami, etc.

All of this is simply one hypothetical example, and all of it is pretty much fantasy out-of-the-box thinking. Why not try it? What could go wrong? What other ways might also be useful? Where do we go from here? Anywhere? Nowhere? Somewhere? Maybe somewhere better than what we have right now?

	To Disrupt or Not to… on We saw with the pandemic that…
	Everywhere Plans for… on We saw with the pandemic that…
	Blogging Insights #… on Open Business Communications i…
	WordPress Community… on Linking Together the WordPress…
	Products vs. Service… on Promoting Network Effects vs.…

Tag: information retrieval

Open Source Search Engine Development: WordPress Data Structures

Linking Together the WordPress Community