Similarity Map and WD Usage Volume. Each bubble represents a client project. The size of the bubble reflects the volume of Wikidata usage in the respective project; a logarithmic scale is used in this plot. Projects similar in respect to the semantics of Wikidata usage are grouped together. Use the tools next to the plot legend to explore the plot and hover over bubbles for details.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Wikidata Usage Highlights. Each bubble represents a client project. The size of the bubble reflects the volume of Wikidata usage in the respective project.
Projects similar in respect to the semantics of Wikidata usage are grouped together. Only top five projects (of each project type) in respect to Wikidata usage volume are labeled.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Wikidata Usage Tendency. Each bubble represents a Wikidata semantic category. These categories represent one possible way of categorizing the Wikidata items. The size of the bubble reflects the volume of Wikidata usage from the respective category. If two categories are found in proximity, that means that the projects that tend to use the one also tend to use the another, and vice versa.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Wikidata Usage Distribution: Project Usage Rank-Frequency. Each point represents a client project. Wikidata usage is represented on the vertical and the project usage rank on the horizontal axis. Only top projects per project type are labeled.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Wikidata Usage Distribution: Project Usage log(Rank)-log(Frequency). Each point represents a client project. The logarithms of Wikidata usage and project usage rank are represented on on the vertical and horizontal axis, respectively. Top three projects per project type are labeled.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Client Project Types. Wikidata usage breakdown across the client project types. Each row represents one client project type. Semantic categories of Wikidata items are placed on the horizontal axis, while the respective usage counts are given on the vertical axis.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Client Projects Usage Volume. Wikidata usage across the client projects. Use slider (below the chart) to select the range of client projects by percentile ranks*.
Note: The chart present at most 30 top projects (in terms of Wikidata usage volume) from the selection.


Loading...

*The percentile rank of a score is the percentage of scores in its frequency distribution that are equal to or lower than it. For example, a client project that has a Wikidata usage volume greater than or equal to 75% of all client projects under consideration is said to be at the 75th percentile, where 75 is the percentile rank.




WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Client Project + Semantic Category Usage Cross-Tabulation. Wikidata usage breakdown across the client projects, project types, and semantic categories. Sort the table by any of its columns or enter a search term to find a specific project, project type, or Wikidata semantic category.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Client Project Usage Tabulation. Wikidata usage per client project. Sort the table by any of its columns or enter a search term to find a specific project or project type.


Loading...



WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



WDCM Overview Dashboard

Description


Introduction


This Dashboard is a part of the Wikidata Concepts Monitor (WDMC). The WDCM system provides analytics on Wikidata usage across the client projects. The WDCM Overview Dashboard presents the big picture of Wikidata usage; other WDCM dashboards go into more detail. The Overview Dashboard provides insights into (1) the similarities between the client projects in respect to their use of of Wikidata, as well as (2) the volume of Wikidata usage in every client project, (3) Wikidata usage tendencies, described by the volume of Wikidata usage in each of the semantic categories of items that are encompassed by the current WDCM edition, (4) the similarities between the Wikidata semantic categories of items in respect to their usage across the client projects, (5) ranking of client projects in respect to their Wikidata usage volume, (6) the Wikidata usage breakdown across the types of client projects and Wikidata semantic categories.


Definitions


N.B. The current Wikidata item usage statistic definition is the count of the number of pages in a particular client project where the respective Wikidata item is used. Thus, the current definition ignores the usage aspects completely. This definition is motivated by the currently present constraints in Wikidata usage tracking across the client projects (see Wikibase/Schema/wbc entity usage). With more mature Wikidata usage tracking systems, the definition will become a subject of change. The term Wikidata usage volume is reserved for total Wikidata usage (i.e. the sum of usage statistics) in a particular client project, group of client projects, or semantic categories. By a Wikidata semantic category we mean a selection of Wikidata items that is that is operationally defined by a respective SPARQL query returning a selection of items that intuitivelly match a human, natural semantic category. The structure of Wikidata does not necessarily match any intuitive human semantics. In WDCM, an effort is made to select the semantic categories so to match the intuitive, everyday semantics as much as possible, in order to assist anyone involved in analytical work with this system. However, the choice of semantic categories in WDCM is not necessarily exhaustive (i.e. they do not necessarily cover all Wikidata items), neither the categories are necessarily mutually exclusive. The Wikidata ontology is very complex and a product of work of many people, so there is an optimization price to be paid in every attempt to adapt or simplify its present structure to the needs of a statistical analytical system such as WDCM. The current set of WDCM semantic categories is thus not normative in any sense and can become a subject of change in any moment, depending upon the analytical needs of the community.

The currently used WDCM Taxonomy of Wikidata items encompasses the following 14 semantic categories: Geographical Object, Organization, Architectural Structure, Human, Wikimedia, Work of Art, Book, Gene, Scientific Article, Chemical Entities, Astronomical Object, Thoroughfare, Event, and Taxon.


Wikidata Usage Overview


The similarity structure in Wikidata usage across the client projects is presented. Each bubble represents a client project. The size of the bubble reflects the volume of Wikidata usage in the respective project. Projects similar in respect to the semantics of Wikidata usage are grouped together.
The bubble chart is produced by performing a t-SNE dimensionality reduction of the client project pairwise Euclidean distances derived from the Projects x Categories contingency table. Given that the original higher-dimensional space from which the 2D map is derived is rather constrained by the choice of a small number of semantic categories, the similarity mapping is somewhat imprecise and should be taken as an attempt at an approximate big picture of the client projects similarity structure only. More precise 2D maps of the similarity structures in client projects are found on the WDCM Semantics Dashboard, where each semantic category first receives an LDA Topic Model, and the similarity structure between the client projects is then derived from project topical distributions.
While the Explore tab presents a dynamic {Rbokeh} visualization alongside the tools to explore it in detail, the Highlights tab shows a static {ggplot2} plot with the most important client projects marked (NOTE. Only top five projects (of each project type) in respect to Wikidata usage volume are labeled).


Wikidata Usage Tendency


The similarity structure in Wikidata usage across the semantic categories is presented. Each bubble represents a Wikidata semantic category. The size of the bubble reflects the volume of Wikidata usage from the respective category. If two categories are found in proximity, that means that the projects that tend to use the one also tend to use the another, and vice versa. Similarly to the Usage Overview, the 2D mapping is obtained by performing a t-SNE dimensionality reduction of the pairwise category Euclidean distances derived from the Projects x Categories contingency table.


Wikidata Usage Distribution


The plots are helpful to build an understanding of the relative range of Wikidata usage across the client projects. In the Project Usage Rank-Frequency plot, each point represents a client project; Wikidata usage is represented on the vertical and the project usage rank on the horizontal axis, while only top project (per project type) are labeled. The highly-skewed, asymmetrical distribution reveals that a small fraction of client projects only accounts for a huge proportion of Wikidata usage.
In the Project Usage log(Rank)-log(Frequency) plot, the logarithms of both variables are represented. A power-law relationship holds true if this plot is linear. The plot includes the best linear fit, however, no attempts to estimate the underlying probability distribution were made.


Client Project Types


Project types are provided in the rows of this chart, while the semantic categories are given on the horizontal axis. The height of the respective bar indicates Wikidata usage volume from the respective semantic category in a particular client project.


Client Projects Usage Volume


Use the slider to select the percentile rank range of the Wikidata usage volume distribution across the client project to show. The chart will automatically adjust to present the selected projects in increasing order of Wikidata usage, and presenting at most 30 top projects from the selection. NOTE. The percentile rank of a score is the percentage of scores in its frequency distribution that are equal to or lower than it. For example, a client project that has a Wikidata usage volume greater than or equal to 75% of all client projects under consideration is said to be at the 75th percentile, where 75 is the percentile rank.
In effect, you can browse the whole distribution of Wikidata usage across the client projects by selecting the lower and uppers limit in terms of usage percentile rank.


Tabs/Crosstabs


A breakdown of Wikidata usage statistics across client projects and semantic categories. First, a table that presents a Client Project vs. Semantic Category cross-tabulation. The Usage column in this table is the Wikidata usage statistic for a particular Semantic Category x Client Project combination (e.g. The Wikidata usage in the category "Human" in the dewiki project). The second table presents the total Wikidata usage per client project (i.e. the sum of Wikidata usage across all semantic categories for a particular client project; e.g. the total Wikidata usage volume of enwiki).




WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



WDCM Navigation

Your orientation in the WDCM Dashboards System


  • WDCM Portal
    The entry point to WDCM Dashboards.

  • WDCM Overview
    The big picture. Fundamental insights in how Wikidata is used across the client projects.

  • WDCM Semantics
    Detailed insights into the WDCM Taxonomy (a selection of semantic categories from Wikidata), its distributional semantics, and the way it is used across the client projects. If you are looking for Topic Models - that’s where they live.

  • WDCM Usage
    Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar.

  • WDCM Geo
    Wikidata items interactive maps.

  • WDCM Structure
    A method to investigate the WDCM Taxonomy and improve the choice of items that undergo analyses.

  • WDCM Biases
    The WDCM gender bias and north-south divide statistics.

  • WDCM (S)itelinks
    The WDCM (S)itelinks usage aspect statistics.

  • WDCM (T)itles
    The WDCM (T)itles usage aspect statistics.


  • WDCM System Technical Documentation
    The WDCM Wikitech Page.

  • WDCM Wikidata Project Page
    The WDCM Wikidata Project Page.

  • The WDCM Journal
    A regularly update selection of the most interesting empirical findings from wmdeanalytics.




WDCM Overview :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm