API¶

doi.pdf_to_doi(filepath, maxlines=None)[source]¶

Try to get DOI from a filepath. It looks for a regex in the binary data and returns the first DOI found, in the hopes that this DOI is the correct one.

Parameters

filepath (str) – Path to the pdf file.
maxlines (Optional[int]) – Maximum number of lines that should be checked For some documents, it could spend a long time trying to look for a DOI, and DOIs in the middle of documents don’t tend to be the correct DOI of the document.

Return type

Optional[str]

Returns

DOI or None.

doi.validate_doi(doi)[source]¶

We check that the DOI can be resolved by official means. If so, we return the resolved URL, otherwise, we return None (which means the DOI is invalid).

Parameters: doi (str) – Identifier.
Return type: Optional[str]
Returns: The URL assigned to the DOI or None.

doi.get_clean_doi(doi)[source]¶

Check if the DOI is actually a URL and in that case just get the exact DOI.

Parameters: doi (str) – String containing a DOI.
Return type: str
Returns: The extracted DOI.

doi.find_doi_in_text(text)[source]¶

Try to find a DOI in a text.

Parameters: text (str) – Text in which to look for DOI.
Return type: Optional[str]
Returns: A DOI, if found, otherwise None.

doi.get_real_url_from_doi(doi)[source]¶

Get a URL corresponding to a DOI.

Parameters: doi (str) – Identifier.
Return type: Optional[str]
Returns: A URL for the DOI. If the DOI is invalid, return None.