A 1000-line Python script outputs different results on different operating systems, tainting 100 published studies.
#chemistry #Python #glitch #operating #system #OS

A Code Glitch May Have Caused Errors In More Than 100 Published Studies - VICE

Scientists in Hawaiʻi have uncovered a glitch in a piece of code that could have yielded incorrect results in over 100 published studies that cited the original paper.

The glitch caused results of a common chemistry computation to vary depending on the operating system used, causing discrepancies among Mac, Windows, and Linux systems. The researchers published the revelation and a debugged version of the script, which amounts to roughly 1,000 lines of code, on Tuesday in the journal Organic Letters.

“This simple glitch in the original script calls into question the conclusions of a significant number of papers on a wide range of topics in a way that cannot be easily resolved from published information because the operating system is rarely mentioned,” the new paper reads. “Authors who used these scripts should certainly double-check their results and any relevant conclusions using the modified scripts in the [supplementary information].”
Luo’s results did not match up with the NMR values that Williams’ group had previously calculated, and according to Sun, when his students ran the code on their computers, they realized that different operating systems were producing different results. Sun then adjusted the code to fix the glitch, which had to do with how different operating systems sort files.
I am not sure if those supercomputers are owned by Universities... Probably just few US universities can afford those...
well, anybody using google during the research...
well, anybody using google during the research...
here are the imports from the script. i was hoping to catch them not using a library like numpy. that would be a near guarantee of this sort of thing happening. except they did use numpy.

honestly, this is exactly the sort of thing that libraries like numpy are supposed to take care of. perhaps they should have used scipy. (which i think uses numpy but im not sure.)

import sqlite3
from rdkit import Chem
from rdkit.Chem import AllChem
import doctest
import optparse
from elements import ELEMENTS
from molmass import Formula
from pprint import pprint
from future import division, print_function
from functools import reduce
from isicle import version
from isicle import export
from isicle.resources import geometry
from isicle.resources.elements import ELEMENTS, Isotope
from isicle.utils import inchi2key, smi2key, write_string
from isicle.utils import pop_atom, push_atom
from isicle.utils import read_mass, write_string
from isicle.utils import read_pka, read_mol, write_string
from isicle.utils import read_string
from isicle.utils import read_string, write_string
from isicle.utils import tail
from multiprocessing import cpu_count
from os.path import *
from pkg_resources import resource_filename
from setuptools import setup, find_packages
from snakemake import snakemake
from statsmodels.stats.weightstats import DescrStatsW
from string import Template
import argparse
import copy
import glob
import logging
import math
import numpy as np
import openbabel
import openbabel as ob
import os
import pandas as pd
import platform
import pybel
import re
import shutil
import subprocess
import sys