02/09/2020: Timetable updated!
15/06/2020: Trial Data are ready! Download them here!
14/06/2020: Join our Google group to receive updates and to submit questions and comments!
05/05/2020: If you want to participate in the challenge, fill in this form
16/03/2020: We are now online!
The task CONcreTEXT (so dubbed after CONcreteness in conTEXT) focuses on automatic concreteness (and conversely, abstractness) recognition. Given a sentence along with a target word, we ask participants to propose a system able to assess the concreteness of the target word according to a [1-7] concreteness scale, where 1 stands for fully abstract (e.g., ‘idempotence’) and 7 for maximally concrete (e.g., ‘car’).
The concreteness score being assigned to the word must be evaluated in context: the word should not be considered in isolation, but as part of the given sentence. For example, systems are expected to assign different scores to the verb ‘COVER’ in the next two sentences:
- COVER the pot and bring the water to a vigorous boil;
- Your fees and tuition help to COVER the costs of providing these services.
Target words may be either verbs or nouns.
We invite participants to exploit all possible strategies to solve the task, including (but not limited to) knowledge bases, external training data, word embeddings, etc.
Motivation and state of the art
Ordinary experience suggests that semantic representation and lexical access and processing of concepts can be affected by concepts’ concrete/abstract status: concrete meanings, closer to perceptual experience, are acknowledged to be more quickly and easily delivered in human communication than abstract meanings . Such kind of information grasps a complex combination of experiential (e.g., sensory, motor) and strictly linguistic features, such as verbal associations arising through co-occurrence patterns and syntactic information . These features make conceptual concreteness/abstractness a challenging though only superficially explored field, with the notable exception of some works at the intersection between Computational Linguistics and Cognitive Science, such as  . In the last few years, mounting experimental evidences have been gathered in the fields of Neuroscience and Cognitive Science on conceptual access and retrieval dynamics that posit novel issues, such as imageability associated to terms and concepts , and traits contributing to inferential vs. referential lexical tasks  .
The CONcreTEXT task is aimed at investigating how the concreteness information affects sense selection: different from past research, we are interested in assessing the concreteness of terms in context rather than in isolation  . The concreteness score is assumed to be a property of word meanings rather than a property of word forms; thus, scoring the concreteness of a term in context implicitly requires to individuate its underlying sense, by handling lexical phenomena such as polysemy and homonymy.
Reference scientific communities
Information on conceptual concreteness impacts on many diverse tasks and different fields. As mentioned, this task addresses from a novel perspective an old problem such as word sense disambiguation, even if sense identification and grounding are intentionally left unspecified (that is, participants are not requested to specify what does the target mean by providing a WordNet/BabelNet synset identifier). Additionally, the concrete/abstract nature of senses may be relevant to further NLP tasks, such as the semantic processing of figurative language   , the automatic translation and simplification , the characterisation of web queries with difficulty scores , the processing of social tagging information . Furthermore, the task will be useful to test contextualized models (such as those descending from  and ), that were proposed to extend traditional embeddings by also extracting context sensitive features.
The CONcreTEXT task may be relevant also for the psycho-linguistics community, where ratings about concreteness, imageability and other features (a.o. , ) are largely used as control variables in many experiments. The resulting annotated dataset itself (for both the Italian and English languages) will be a resource to be exploited for future researches focused on concreteness in a more contextual, and thus ecological, setting.
Definition of concreteness
Operationally, the very first issue is that it is not straightforward to define concreteness/abstractness . Provided that more fine grained distinctions on abstract and concrete word meanings can be drawn, the term ‘concrete’ has two main interpretations:
- what is closer to perception (as opposed to what cannot be experienced directly through the senses);
- what is more specific (as opposed to high-level, abstract).
We are mostly interested in the first aspect, that is perceptually salient concreteness/abstractness.
The dataset used for this task will be taken from the English-Italian parallel section of The Human Instruction Dataset , derived from WikiHow instructions. The dataset is freely available on Kaggle. All such documents have been anonymized beforehand, so that downloaded data present no privacy nor data sensitivity issues.
The dataset will be composed by overall 1,000 sentences, and arranged as follows: 500 Italian sentences plus 500 English sentences. For each sentence a target term will be selected, and multiple annotators will be asked to provide it with a concreteness score (1-7 scale). After the annotation, inter-rater agreement will be computed and items featured by reduced agreement will be dropped, so to deliver fully reliable data. Human ratings will be averaged, and the resulting figures will be used as gold standard.
The dataset will be split into trial and test data, with a proportion of 20-80. Trial data will be released with the concreteness scores, the test data will be delivered without scores, and it will be object of evaluation.
The dataset will be released as tab-separated files (one for Italian, one for English) containing these fields:
- TARGET: the lemma of the target word your system should assign a concreteness score to;
- POS: the part-of-speech tag of the target word;
- INDEX: the index of the target word in the proposed sentence;
- TEXT: the sentence containing the target to be evaluated.
- SCORE: empty field to be filled with your assigned concreteness score (from 1 to 7). (TRIAL RELEASE ONLY)
Participants’ output will be evaluated by measuring its correlation with human ratings, through Pearson and Spearman coefficients.
The baseline will be implemented by considering the concreteness scores provided in literature:  for English and  for Italian.
Let us presently consider Italian and English languages as (L); in both languages, given a sentence SL composed of N words, we will compute the concreteness score of the target word w as a function of the average concreteness of the sentence. The underlying assumption is that concrete senses typically co-occur with concrete ones, and the same holds for abstract senses .
Then Ct, the concreteness of the target word t ∈ SL will be computed averaging the scores associated to all lexical items contained therein: ∑Ct / N
For terms that have no annotated concreteness score in the considered sources ( and ), we will employ the concreteness score of the closest term, on the basis of vector representation, for which human ratings are available in the mentioned works. To these ends, either ConceptNet Numberbatch  or FastText  word embeddings will be used.
How to Participate
If you want to participate, follow these steps:
- Fill in the EVALITA form
- Join our google group
- Download the trial data here
- More TBD!
29th May 202015th June 2020: distribution of trial data
- 4th September 2020: EVALITA registration deadline
11th - 17th September 202025th September - 2nd October 2020: evaluation windows and collection of participants results
- 5th October 2020: assessment returned to participants
- 16th October 2020: deadline for submission of system description papers
- 6th November 2020: technical reports due to organisers (camera-ready)
- 27th November 2020: Videos presentations to the Evalita chair
- 16th – 17th December 2020: EVALITA 2020
 Valentina Bambini, Donatella Resta, and Mirko Grimaldi. A dataset of metaphors from the Italian literature: exploring psycholinguistic variables and the role of context. PloS one, 9(9):1–13, 2014.
 Gabriella Vigliocco, Lotte Meteyard, Mark Andrews, and Stavroula Kousta. Toward a theory of semantic representation. Language and Cognition, 1(2):219–247, 2009.
 Felix Hill, Douwe Kiela, and Anna Korhonen. Concreteness and corpora: A theoretical and practical study. In Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pages 75–83, 2013.
 Felix Hill and Anna Korhonen. Learning abstract concept embeddings from multi-modal data: Since you probably can’t see what i mean. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 255–265, 2014.
 Alessandra Vergallito, Marco Alessandro Petilli, and Marco Marelli. Perceptual modality norms for 1,121 italian words: A comparison with concreteness and imageability scores and an analysis of their impact in word processing tasks. Behavior Research Methods, pages 1–18, 2019.
 Diego Marconi. Lexical competence. MIT Press, 1997.
 Francesca Garbarini, Fabrizio Calzavarini, Matteo Diano, Monica Biggio, Carola Barbero, Daniele P Radicioni, Giuliano Geminiani, Katiuscia Sacco, and Diego Marconi. Imageability effect on the functional brain activity during a naming to definition task. Neuropsychologia, 137:107275, 2020.
 Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. Concreteness ratings for 40,000 generally known english word lemmas. BEHAV RES METH, 46(3):904–911, 2014.
 Maria Montefinese, Ettore Ambrosini, Beth Fairfield, and Nicola Mammarella. The adaptation of the affective norms for english words (anew) for italian. Behavior research methods, 46(3):887–903, 2014.
 Julia Birke and Anoop Sarkar. A clustering approach for nearly unsupervised recognition of nonliteral language. In Procs. of the 11th conference of EACL, 2006.
 Yair Neuman, Dan Assaf, Yohai Cohen, Mark Last, Shlomo Argamon, Newton Howard, and Ophir Frieder. Metaphor identification in large texts corpora. PloS one, 8(4):e62343, 2013.
 Enrico Mensa, Aureliano Porporato, and Daniele P. Radicioni. Grasping metaphors: Lexical semantics in metaphor analysis. In The Semantic Web: ESWC 2018 Satellite Events, pages 192–195, Cham, 2018. Springer. ISBN 978-3-319-98192-5.
 Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. A monolingual treebased translation model for sentence simplification. In Procs. of the 23rd international conference on computational linguistics, pages 1353–1361. ACL, 2010.
 Xing Xing, Yi Zhang, and Mei Han. Query difficulty prediction for contextual image retrieval. In European Conference on Information Retrieval, pages 581–585, 2010.
 Dominik Benz, Christian Körner, Andreas Hotho, Gerd Stumme, and Markus Strohmaier. One tag to bind them all: Measuring term abstractness in social metadata. In Procs. of ESWC, pages 360–374. Springer, 2011.
 Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of NAACL-HLT, pages 2227–2237, 2018.
 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
 Rumen Iliev and Robert Axelrod. The paradox of abstraction: Precision versus concreteness. Journal of psycholinguistic research, 46(3):715–729, 2017.
 Paula Chocron and Paolo Pareti. Vocabulary alignment for collaborative agents: a study with real-world multilingual how-to instructions. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-18, pages 159–165. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi: 10.24963/ijcai.2018/22. URL https://doi.org/10.24963/ijcai.2018/22.
 Diego Frassinelli, Daniela Naumann, Jason Utt, Im Walde, and Sabine Schulte. Contextual characteristics of concrete and abstract words. In IWCS 2017, 2017.
 Robert Speer, Joshua Chin, and Catherine Havasi. ConceptNet 5.5: An open multilingual graph of general knowledge. In AAAI, pages 4444–4451, 2017. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.
 Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.
- Daniele Radicioni, Università di Torino
- Rossella Varvara, Università di Firenze
- Lorenzo Gregori, Università di Firenze
- Andrea Amelio Ravelli, Istituto di Linguistica Computazionale “A. Zampolli” (CNR) Pisa
- Maria Montefinese, Università di Padova