Exploring the Molmass Python Package: A Awesome Tool for Chemists and Data Analysts

molmass is a Python package designed for chemists and researchers. It provides tools for calculating the molar mass, mass distribution, and isotopic patterns of chemical substances. Developed by Christoph Gohlke, this package is particularly useful for those involved in analytical chemistry, molecular modeling, and educational purposes.

Key Features of molmass Python package

  • Molar Mass Calculation: Quickly compute the molar mass of a chemical compound from its formula.
  • Formula Analysis: Break down chemical formulas to understand the composition of molecules, including atom count and percentage by mass.
  • Isotopic Pattern Simulation: Generate isotopic distribution patterns, which are crucial for interpreting mass spectrometry data.
  • Interactive web application: For users less comfortable with coding, molmass provides both graphical and user interface for easy access to its features.

Benefits for Chemists and Data Analysts

  • Efficiency in Data Analysis: Automate the calculation of molar masses and isotopic patterns, saving time in data processing.
  • Accuracy: Reduces the risk of human error in manual calculations.
  • Integration with Other Python Tools: Can be combined with other Python libraries for more comprehensive data analysis and visualization tasks.
  • Educational Resource: A great tool for teaching molecular mass concepts and isotopic distribution.

Getting Started with Molmass

Installation

molmass can be installed via pip:

pip install molmass

However, the above command will only install the core package. To install the interactive web application and also pandas to have access to tabular data, use the following command instead:

pip install -U molmass[all]

This command will install the molmass package and all dependencies.

It is recommended to install molmass in a virtual environment. For more information on how to create and manage virtual environments, see Python’s official documentation.

Dictionary of elements

You can have quick access to physicochemical and descriptive properties of chemical elements using the ELEMENTS dictionary.

from molmass import ELEMENTS
hydrogen = ELEMENTS["H"]
hydrogen
Element(
    1, 'H', 'Hydrogen',
    group=1, period=1, block='s', series=1,
    mass=1.007941, eleneg=2.2, eleaffin=0.75420375,
    covrad=0.32, atmrad=0.79, vdwrad=1.2,
    tboil=20.28, tmelt=13.81, density=0.084,
    eleconfig='1s',
    oxistates='1*, -1',
    ionenergy=(13.5984,),
    isotopes={
        1: Isotope(1.00782503223, 0.999885, 1),
        2: Isotope(2.01410177812, 0.000115, 2),
    },
)

See that each element is represented by an instance of an Element class. This way, each property can be accessed as an attribute of the element:

hydrogen.protons
1
hydrogen.isotopes
{1: Isotope(mass=1.00782503223, abundance=0.999885, massnumber=1, charge=0),
 2: Isotope(mass=2.01410177812, abundance=0.000115, massnumber=2, charge=0)}

For heavier elements, the package can be a useful resource of electron configuration data, both in the condensed and the orbital notation:

silver = ELEMENTS["Ag"]
silver
Element(
    47, 'Ag', 'Silver',
    group=11, period=5, block='d', series=8,
    mass=107.8682, eleneg=1.93, eleaffin=1.30447,
    covrad=1.34, atmrad=1.75, vdwrad=1.72,
    tboil=2436.0, tmelt=1235.1, density=10.49,
    eleconfig='[Kr] 4d10 5s',
    oxistates='2, 1*',
    ionenergy=(7.5762, 21.49, 34.83),
    isotopes={
        107: Isotope(106.9050916, 0.51839, 107),
        109: Isotope(108.9047553, 0.48161, 109),
    },
)
silver.eleconfig
'[Kr] 4d10 5s'
silver.eleconfig_dict
{(1, 's'): 2,
 (2, 's'): 2,
 (2, 'p'): 6,
 (3, 's'): 2,
 (3, 'p'): 6,
 (3, 'd'): 10,
 (4, 's'): 2,
 (4, 'p'): 6,
 (4, 'd'): 10,
 (5, 's'): 1}

Class Formula

One of most impressive features of molmass is the Formula class. It allows you to analyze chemical formulas and calculate their molar mass, mass distribution, and isotopic patterns. Let’s see how it works.

from molmass import Formula
water = Formula("H2O")
water
Formula('H2O')

Checking the number of atoms in a formula is easy:

water.atoms
3

As well the charge:

water.charge
0

For mass, we have three possibilities:

  • mass: the average relative molecular mass considering the natural abundance of isotopes. Equals the molar mass in g/mol.
  • monoisotopic_mass: the mass of the molecule considering only the most abundant isotope of each element.
  • nominal_mass: the number of protons and neutrons in the isotope composed of the most abundant elemental isotopes.
water.mass
18.015287
water.monoisotopic_mass
18.01056468403
water.nominal_mass
18

We can have the elemental composition of a formula using the composition method:

water.composition()
print(water.composition())
Element  Count  Relative mass  Fraction %
H            2       2.015882     11.1898
O            1      15.999405     88.8102

But it is more useful in a table format, that can be obtained using a Pandas dataframe:

water.composition().dataframe()
Count Relative mass Fraction
Element
H 2 2.015882 0.111898
O 1 15.999405 0.888102

Isotopes

One thing that makes molmass stand out is its ability to calculate isotopic patterns. This is particularly useful for mass spectrometry data analysis. First, let’s see how the package deals with isotopes, even recognizing the most common isotopes of each element by their symbols:

# deuterated water (heavy water)
d_water = Formula("D2O")
d_water
Formula('[2H]2O')
d_water.mass
20.02760855624
d_water.composition().dataframe()
Count Relative mass Fraction
Element
2H 2 4.028204 0.201133
O 1 15.999405 0.798867
# C13 CO2
carbon13_co2 = Formula("[13C]O2")
carbon13_co2
Formula('[13C]O2')
carbon13_co2.mass
45.00216483507

Isotopic Distribution

The mass distribution spectrum of a molecule is an essential concept in chemistry, particularly in mass spectrometry. It provides insights into the isotopic composition of a molecule and how these isotopes contribute to the molecule’s overall mass.

Each element consists of isotopes, which are atoms with the same number of protons but different numbers of neutrons. This difference in neutrons results in isotopes of the same element having slightly different masses. When you analyze a molecule, these isotopic variations lead to a distribution of possible mass values rather than a single, definitive mass.

Consider a water molecule, which is composed of two hydrogen atoms and one oxygen atom. The hydrogen atom can exist as either protium (1H) or deuterium (2H), and oxygen can have isotopes like 16O, 17O, and 18O. The mass spectrum of water would show peaks corresponding to these isotopic combinations, such as H2[16O], H2[17O], and so on. Each peak represents a different isotopic version of water, contributing to the overall mass spectrum of the molecule.

For a given formula, we can calculate the isotopic distribution using the spectrum method. Let’s see how it works for the previously define molecular formula of water:

water.spectrum().dataframe()
Relative mass Fraction Intensity % m/z
Mass number
18 18.010565 9.973406e-01 1.000000e+02 18.010565
19 19.015557 6.093273e-04 6.109521e-02 19.015557
20 20.014810 2.049629e-03 2.055094e-01 20.014810
21 21.021086 4.714508e-07 4.727079e-05 21.021086
22 22.027363 2.711125e-11 2.718354e-09 22.027363

We see that the most abundant isotopic combination is the one with the lowest mass, which is the one with the most abundant isotopes of each element. The other peaks are due to the presence of heavier isotopes of hydrogen and oxygen.

Sometimes we don’t want to see all the peaks, but only the most abundant ones. We can do this using the min_intensity parameter:

water.spectrum(min_intensity=0.01).dataframe()
Relative mass Fraction Intensity % m/z
Mass number
18 18.010565 0.997341 100.000000 18.010565
19 19.015557 0.000609 0.061095 19.015557
20 20.014810 0.002050 0.205509 20.014810
chlorobenzene = Formula("C6H5Cl")
chlorobenzene
Formula('C6H5Cl')
chlorobenzene.mass
112.557045
chlorobenzene.composition().dataframe()
Count Relative mass Fraction
Element
C 6 72.064440 0.640248
H 5 5.039705 0.044775
Cl 1 35.452900 0.314977
df_chlorobenzene = chlorobenzene.spectrum(min_intensity=0.01).dataframe()
df_chlorobenzene
Relative mass Fraction Intensity % m/z
Mass number
112 112.007978 0.709836 100.000000 112.007978
113 113.011358 0.046473 6.546944 113.011358
114 114.005082 0.228390 32.174991 114.005082
115 115.008420 0.014888 2.097378 115.008420
116 116.011802 0.000407 0.057363 116.011802
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.vlines(x="m/z", ymin=0, ymax="Intensity %", data=df_chlorobenzene)

ax.set_xlim((min(df_chlorobenzene["m/z"]) - 1, max(df_chlorobenzene["m/z"]) + 1))

ax.set_xlabel("m/z")
ax.set_ylabel("Intensity %")

plt.show()

The m/z ratio, often encountered in mass spectrometry, is a fundamental concept used for analyzing and identifying substances based on their mass and charge characteristics.

  • “m” Refers to Mass: In the m/z ratio, “m” stands for the mass of the ion. It’s important to note that this mass is the atomic mass unit (amu) of the ion, not the molecular mass of the original compound. This distinction is critical because, during mass spectrometry, molecules are often fragmented into ions.

  • “z” Refers to Charge: The “z” in the ratio represents the charge number of the ion. It’s the number of protons more or less than electrons, giving the ion its positive or negative charge. In most mass spectrometry applications, the ions are single-charged, meaning z is usually ±1. However, multiply-charged ions can also occur.

  • Separation of Ions: In mass spectrometry, ions are separated based on their m/z ratio. Since the technique involves ionizing the sample, different fragments or ions have different m/z ratios, allowing for their separation and detection.

  • Identification of Compounds: By analyzing the m/z ratios of the ions produced, chemists can deduce the molecular structure of the original compound. Each m/z value can correspond to a specific fragment of the molecule, providing clues to its structure.

  • Quantitative Analysis: The intensity of the signal at each m/z ratio can be used to determine the concentration of specific ions in the sample, making mass spectrometry a valuable tool for quantitative analysis.

We can define a new chlorobenzene formula, now with a charge of 2+:

chlorobenzene_ion = Formula("[C6H5Cl]2+")
chlorobenzene_ion
Formula('[C6H5Cl]2+')
chlorobenzene_ion.spectrum(min_intensity=0.01).dataframe()
Relative mass Fraction Intensity % m/z
Mass number
112 112.006881 0.709836 100.000000 56.003440
113 113.010261 0.046473 6.546944 56.505131
114 114.003985 0.228390 32.174991 57.001992
115 115.007323 0.014888 2.097378 57.503662
116 116.010705 0.000407 0.057363 58.005353

See that the m/z column now changes to reflect the charge of the molecule.

Abbreviations

We have already seen that molmass recognized D as the symbol for deuterium. But there are other abbreviations that can be used to represent radicals, such as Me for methyl, and Et for ethyl. Let’s see how molmass deals with these abbreviations.

We can define ethanol by its usual formula:

ethanol = Formula("C2H5OH")
ethanol
Formula('C2H5OH')

And also by using the abbreviation for ethyl:

ethanol = Formula("EtOH")
ethanol
Formula('(C2H5)OH')

Empirical Formula

The empirical formula of a compound is a simple expression of the relative number of each type of atom in it. It’s the simplest integer ratio of the elements present in the compound, reflecting the composition in terms of the smallest whole numbers. The empirical formula doesn’t necessarily represent the exact numbers of atoms found in a molecule of the compound (that’s the molecular formula), but rather the simplest whole-number ratio between the elements.

  • Representation of Composition: The empirical formula shows the simplest ratio of the different atoms in a compound. For example, in glucose (C6H12O6), the empirical formula is CH2O, representing a 1:2:1 ratio of carbon to hydrogen to oxygen.
  • Derivation from Mass Percentages: It can be derived from the mass percentages of each element in a compound. This is a common way to determine empirical formulas in a laboratory setting.
  • Use in Stoichiometry: Empirical formulas are useful in stoichiometry for calculating reactant and product quantities in chemical reactions.

Examples of Empirical Formulas

  1. Water (H2O): The empirical formula is H2O, indicating a 2:1 ratio of hydrogen to oxygen atoms.
  2. Ethylene (C2H4): The empirical formula is CH2, which represents the simplest whole-number ratio of carbon to hydrogen atoms in the molecule.
  3. Benzene (C6H6): Despite having six carbon and six hydrogen atoms, the empirical formula is CH, as it reflects the 1:1 ratio of carbon to hydrogen atoms.
  4. Acetic Acid (C2H4O2): Although the molecular formula is C2H4O2, the empirical formula is CH2O, showing the simplest ratio of 1 carbon atom to 2 hydrogen atoms to 1 oxygen atom.

In chemistry, the empirical formula is a fundamental concept used to understand the basic composition of a compound, especially useful when the molecular formula is complex or when only compositional data is available.

molmass can calculate the empirical formula of a compound using the empirical method:

acetic_acid = Formula("CH3COOH")
acetic_acid
Formula('CH3COOH')
acetic_acid.empirical
'CH2O'

Ions

As well as neutral molecules, molmass can deal with ions. Let’s see how it works.

sulphate = Formula("[SO4]2-")
sulphate
Formula('[SO4]2-')
sulphate.charge
-2

Hydrates

It is possible to parse hydrates with Formula class.

copper_sulphate_hydrated = Formula("CuSO4.5H2O")
copper_sulphate_hydrated
Formula('CuSO4(H2O)5')
copper_sulphate_hydrated.composition().dataframe()
Count Relative mass Fraction
Element
Cu 1 63.546000 0.254505
H 10 10.079410 0.040369
O 9 143.994645 0.576706
S 1 32.064800 0.128421

Nucleotides

The common abbreviations for nucleotides are recognized by molmass:

nucleotides = Formula("ATCG")
nucleotides
Formula('((C10H12N5O5P)(C9H12N3O6P)(C10H12N5O6P)(C10H13N2O7P)H2O)')
nucleotides.mass
1253.804568992
nucleotides.composition().dataframe()
Count Relative mass Fraction
Element
C 39 468.418860 0.373598
H 51 51.404991 0.040999
N 15 210.100545 0.167570
O 25 399.985125 0.319017
P 4 123.895048 0.098815

Other features

from molmass import ELECTRON, PROTON, NEUTRON
ELECTRON
Particle(name='Electron', mass=0.000548579909065, charge=-1.602176634e-19)
ELECTRON.mass  # relative mass of electron in atomic mass units (u)
0.000548579909065
PROTON
Particle(name='Proton', mass=1.007276466621, charge=1.602176634e-19)
NEUTRON
Particle(name='Neutron', mass=1.00866491595, charge=0.0)

Flask web application

You can launch the web application using the following command:

python -m molmass.web

A web browser will open with the application running. You can also access it by typing http://127.0.0.1:5001 in your browser. Pass a formula in the input field and click the “Submit” button to see the results.

Conclusion

molmass is a versatile tool that bridges chemistry and data analysis. Its ease of use, combined with powerful features, makes it a useful tool for chemists, educators, and data analysts in the field of chemistry.

Further Resources

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top