API

doi.pdf_to_doi(filepath, maxlines=None)[source]

Try to get DOI from a filepath. It looks for a regex in the binary data and returns the first DOI found, in the hopes that this DOI is the correct one.

Parameters
  • filepath (str) – Path to the pdf file.

  • maxlines (Optional[int]) – Maximum number of lines that should be checked For some documents, it could spend a long time trying to look for a DOI, and DOIs in the middle of documents don’t tend to be the correct DOI of the document.

Return type

Optional[str]

Returns

DOI or None.

doi.validate_doi(doi)[source]

We check that the DOI can be resolved by official means. If so, we return the resolved URL, otherwise, we return None (which means the DOI is invalid).

Parameters

doi (str) – Identifier.

Return type

Optional[str]

Returns

The URL assigned to the DOI or None.

doi.get_clean_doi(doi)[source]

Check if the DOI is actually a URL and in that case just get the exact DOI.

Parameters

doi (str) – String containing a DOI.

Return type

str

Returns

The extracted DOI.

doi.find_doi_in_text(text)[source]

Try to find a DOI in a text.

Parameters

text (str) – Text in which to look for DOI.

Return type

Optional[str]

Returns

A DOI, if found, otherwise None.

doi.get_real_url_from_doi(doi)[source]

Get a URL corresponding to a DOI.

Parameters

doi (str) – Identifier.

Return type

Optional[str]

Returns

A URL for the DOI. If the DOI is invalid, return None.