Skip to content
4 October 2007 / erikduval

I need help…

Indeed, I need help, and I think you may need it too…

If you are doing research on Technology Enhanced Learning, then do you know

  • which of your papers has been cited most often?
  • who has cited you most often?
  • which papers cited a particular publication of yours?
  • whether more and more or less and less people are citing you over the years?
  • whose citing behavior is close to yours?
  • which conference or journal contains most citations of your papers?
  • which conference or journal contains you cite most often?
  • etc.

If google can do a PageRank of every page on the (non-hidden) web, then why can’t we have a CiteRank of every paper in our field? How about making that rank higher for every citation, and more so if the citing paper was itself often cited?

For many of us, these may be “academic questions”, but they are not irrelevant. Citations are the currency of research – a bit like links are the currency of the web. And in some fields, this currency is well managed and accessible to most of the researchers – it seems to me that PubMed acts that way for medicine, and maybe arXiv does for physics.

Of course, general citation indexes like Web of Science are supposed to address this problem, but they are terribly inadequate for our research domain: Web of Science is relatively easy to search, as I can include my affiliation in my search (and I haven’t changed affiliations, which is probably an exception!), but it includes only 13 of my papers and finds only 7 citations of those – though more and more every year, so maybe there is reason to be optimistic🙂

Somewhat more representative for our domain may be citeseer: I am not sure how many of my papers it includes (I can’t seem to search on author?!), but it does find 50 citations to them. This requires browsing through many pages of citations of some paper by some Duval, so this is not very easy to use or process.

DBLP nicely presents a number of different authors whose name resembles mine. It lists 50 of my papers and presents an informative co-author list, but it does not include any information about citations.

Google Scholar seems to include most of my papers and many more references than the previous systems, but there are quite a few doubles and results from (very) gray literature. Also, it is difficult to distinguish self-citations from those by external authors. In a simple exploration, Xavier analysed the google scholar results for our “Metadata Principles and Practicalities” paper: the site mentions 136 references: of which 38 were self references and 10 were unusable, so there were 88 left. Most of those 88 just had the author and the title of the paper (or master thesis or technical report). So, Xavier searched the web and completed the information. As Xavier mentioned to me: “The whole process took me 3 and a half hours and was not fun.”

Bibsonomy claims 235 papers by me, and does not include citation information.

There must be a better way! Maybe you have a suggestion? I can’t believe that all you smart people haven’t come up with a way better approach to managing whom you cite, who cites you, etc…

4 Comments

Leave a Comment
  1. Grainne Conole / Oct 5 2007 6:22 am

    Totally agree Eric – this is a *real* issue – ‘fraid i havent got the answer through. In the UK we are just about to enter our next Research Assessment Exercise and therefore we have being worrying about all of this for some time, along with the dreaded ‘indictors of esteem’. The other issue is how does the blogoshere fit into all of this – as some are increasingly turning to dialogue more through their blogs how does this fit alongside traditional peer assessed papers?

    By the way have you come across the ‘H-number’ another one of these citation things…

  2. Erik / Oct 8 2007 3:36 pm

    I actually would be quite interested to hear more about the REA process: how transparent is that? How does it compare to what is being used in other countries?

    The H-number is one of the many data that the tool at http://www.harzing.com/pop.htm generates: as explained at http://en.wikipedia.org/wiki/Hirsch_number, “The h-index is an index that attempts to quantify the scientific productivity and impact of a scientist based on his/her most quoted papers. A scientist with an H-index of X has published X papers that have at least X citations.” According to google scholar, I have an h index of 13, according to web of science, it is 1😉

  3. lvh / Oct 9 2007 11:45 pm

    I have no clue.

    However, I do know that your citeseer link is broken😉

    In a vague attempt to conjure up something which might work (and could be easily automated):

    Is it safe to assume that anywhere your name occurs in full text search but does not occur in the list of authors, the document cites you? If so, would you be happy knowing just the number of papers you are cited in (ie., being cited in two footnotes and a reference list counts as 1 citation), or do you also want to know how many times you are cited in each paper (and then sum the total, giving you the total number of times you are cited across all your papers)?

    I realise the latter has a lot more value, but it would also require (I think) a lot more work.

    If you are happy with the amount Bibsonomy indexes, you could try looking at the API (beta, but isn’t everything these days?). The ListOfAllPosts method (it’s a method, right? I don’t know any Java) accepts a search string (which I can only assume is full text search) and returns whatever you want, including XML.

    You do the full text search for “Erik Duval” (do people always quote you like that? how about minor roles in a paper, when you’re cited as the et al. in “John Doe, et al.”?) and then you do a grep -v on the author field for Erik Duval. That way, hopefully, the entries you have left are all citations. I think😉

    I’m probably not making any sense. I’m sick as a dog: all codeine and no caffeine make Laurens a strange boy.

  4. lvh / Oct 10 2007 12:07 am

    My last post wasn’t quite complete.

    The method I talked about has its own XML schema. I’ve just looked at that, and I noticed it also defines how a document’s BibTeX parts are described. A good thing, at first sight, but:

    1. Can it rule out full text search? Depends on how well it indexes BibTeX parts of documents.
    2. If you keep the full text search, how do you weed out the doubles? This isn’t important for my “first” method (see other post) because it simply says if you were cited or not, not how many times you were cited. However, this clearly makes keeping the number for the second method accurate a complete nightmare😦

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: