To ban or to not ban, that’s the pickle
Whereas Hugging Face helps machine studying (ML) fashions in varied codecs, Pickle is among the many most prevalent due to the recognition of PyTorch, a extensively used ML library written in Python that makes use of Pickle serialization and deserialization for fashions. Pickle is an official Python module for object serialization, which in programming languages means turning an object right into a byte stream — the reverse course of is named deserialization, or in Python terminology: pickling and unpickling.
The method of serialization and deserialization, particularly of enter from untrusted sources, has been the reason for many distant code execution vulnerabilities in a wide range of programming languages. Equally, the Python documentation for Pickle has an enormous purple warning: “It’s potential to assemble malicious pickle knowledge which is able to execute arbitrary code throughout unpickling. By no means unpickle knowledge that would have come from an untrusted supply, or that would have been tampered with.”
That poses an issue for an open platform like Hugging Face, the place customers brazenly share and should unpickle mannequin knowledge. On one hand, this opens the potential for abuse by ill-intentioned people who add poisoned fashions, however on the opposite, banning this format could be too restrictive given PyTorch’s recognition. So Hugging Face selected the center street, which is to aim to scan and detect malicious Pickle recordsdata.