python – Saving an Object (Data persistence)
python – Saving an Object (Data persistence)
You could use the pickle
module in the standard library.
Heres an elementary application of it to your example:
import pickle
class Company(object):
def __init__(self, name, value):
self.name = name
self.value = value
with open(company_data.pkl, wb) as outp:
company1 = Company(banana, 40)
pickle.dump(company1, outp, pickle.HIGHEST_PROTOCOL)
company2 = Company(spam, 42)
pickle.dump(company2, outp, pickle.HIGHEST_PROTOCOL)
del company1
del company2
with open(company_data.pkl, rb) as inp:
company1 = pickle.load(inp)
print(company1.name) # -> banana
print(company1.value) # -> 40
company2 = pickle.load(inp)
print(company2.name) # -> spam
print(company2.value) # -> 42
You could also define your own simple utility like the following which opens a file and writes a single object to it:
def save_object(obj, filename):
with open(filename, wb) as outp: # Overwrites any existing file.
pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)
# sample usage
save_object(company1, company1.pkl)
Update
Since this is such a popular answer, Id like touch on a few slightly advanced usage topics.
cPickle
(or _pickle
) vs pickle
Its almost always preferable to actually use the cPickle
module rather than pickle
because the former is written in C and is much faster. There are some subtle differences between them, but in most situations theyre equivalent and the C version will provide greatly superior performance. Switching to it couldnt be easier, just change the import
statement to this:
import cPickle as pickle
In Python 3, cPickle
was renamed _pickle
, but doing this is no longer necessary since the pickle
module now does it automatically—see What difference between pickle and _pickle in python 3?.
The rundown is you could use something like the following to ensure that your code will always use the C version when its available in both Python 2 and 3:
try:
import cPickle as pickle
except ModuleNotFoundError:
import pickle
Data stream formats (protocols)
pickle
can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, Protocol version 0 is ASCII and therefore human-readable. Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0
, but in Python 3.8.1, its Protocol version 4
. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL
added to it, but that doesnt exist in Python 2.
Fortunately theres shorthand for writing pickle.HIGHEST_PROTOCOL
in every call (assuming thats what you want, and you usually do), just use the literal number -1
— similar to referencing the last element of a sequence via a negative index.
So, instead of writing:
pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)
You can just write:
pickle.dump(obj, outp, -1)
Either way, youd only have specify the protocol once if you created a Pickler
object for use in multiple pickle operations:
pickler = pickle.Pickler(outp, -1)
pickler.dump(obj1)
pickler.dump(obj2)
etc...
Note: If youre in an environment running different versions of Python, then youll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).
Multiple Objects
While a pickle file can contain any number of pickled objects, as shown in the above samples, when theres an unknown number of them, its often easier to store them all in some sort of variably-sized container, like a list
, tuple
, or dict
and write them all to the file in a single call:
tech_companies = [
Company(Apple, 114.18), Company(Google, 908.60), Company(Microsoft, 69.18)
]
save_object(tech_companies, tech_companies.pkl)
and restore the list and everything in it later with:
with open(tech_companies.pkl, rb) as inp:
tech_companies = pickle.load(inp)
The major advantage is you dont need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I liked @Lutz Prechelts answer the best, so thats the approach used in the sample code below:
class Company:
def __init__(self, name, value):
self.name = name
self.value = value
def pickle_loader(filename):
Deserialize a file of pickled objects.
with open(filename, rb) as f:
while True:
try:
yield pickle.load(f)
except EOFError:
break
print(Companies in pickle file:)
for company in pickle_loader(company_data.pkl):
print( name: {}, value: {}.format(company.name, company.value))
I think its a pretty strong assumption to assume that the object is a class
. What if its not a class
? Theres also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__
after creation, pickle
doesnt respect the addition of those attributes (i.e. it forgets they were added — because pickle
serializes by reference to the object definition).
In all these cases, pickle
and cPickle
can fail you horribly.
If you are looking to save an object
(arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill
, which can serialize almost anything in python.
We start with a class…
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type help, copyright, credits or license for more information.
>>> import pickle
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = banana
>>> company1.value = 40
>>> with open(company.pkl, wb) as f:
... pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
...
>>>
Now shut down, and restart…
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type help, copyright, credits or license for more information.
>>> import pickle
>>> with open(company.pkl, rb) as f:
... company1 = pickle.load(f)
...
Traceback (most recent call last):
File <stdin>, line 2, in <module>
File /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py, line 1378, in load
return Unpickler(file).load()
File /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py, line 858, in load
dispatch[key](self)
File /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py, line 1090, in load_global
klass = self.find_class(module, name)
File /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py, line 1126, in find_class
klass = getattr(mod, name)
AttributeError: module object has no attribute Company
>>>
Oops… pickle
cant handle it. Lets try dill
. Well throw in another object type (a lambda
) for good measure.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type help, copyright, credits or license for more information.
>>> import dill
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = banana
>>> company1.value = 40
>>>
>>> company2 = lambda x:x
>>> company2.name = rhubarb
>>> company2.value = 42
>>>
>>> with open(company_dill.pkl, wb) as f:
... dill.dump(company1, f)
... dill.dump(company2, f)
...
>>>
And now read the file.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type help, copyright, credits or license for more information.
>>> import dill
>>> with open(company_dill.pkl, rb) as f:
... company1 = dill.load(f)
... company2 = dill.load(f)
...
>>> company1
<__main__.Company instance at 0x107909128>
>>> company1.name
banana
>>> company1.value
40
>>> company2.name
rhubarb
>>> company2.value
42
>>>
It works. The reason pickle
fails, and dill
doesnt, is that dill
treats __main__
like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle
does). The reason dill
can pickle a lambda
is that it gives it a name… then pickling magic can happen.
Actually, theres an easier way to save all these objects, especially if you have a lot of objects youve created. Just dump the whole python session, and come back to it later.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type help, copyright, credits or license for more information.
>>> import dill
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = banana
>>> company1.value = 40
>>>
>>> company2 = lambda x:x
>>> company2.name = rhubarb
>>> company2.value = 42
>>>
>>> dill.dump_session(dill.pkl)
>>>
Now shut down your computer, go enjoy an espresso or whatever, and come back later…
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type help, copyright, credits or license for more information.
>>> import dill
>>> dill.load_session(dill.pkl)
>>> company1.name
banana
>>> company1.value
40
>>> company2.name
rhubarb
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>
The only major drawback is that dill
is not part of the python standard library. So if you cant install a python package on your server, then you cant use it.
However, if you are able to install python packages on your system, you can get the latest dill
with git+https://github.com/uqfoundation/[email protected]#egg=dill
. And you can get the latest released version with pip install dill
.
python – Saving an Object (Data persistence)
Quick example using company1
from your question, with python3.
import pickle
# Save the file
pickle.dump(company1, file = open(company1.pickle, wb))
# Reload the file
company1_reloaded = pickle.load(open(company1.pickle, rb))
However, as this answer noted, pickle often fails. So you should really use dill
.
import dill
# Save the file
dill.dump(company1, file = open(company1.pickle, wb))
# Reload the file
company1_reloaded = dill.load(open(company1.pickle, rb))