typeddfs.abs_dfsο
Defines a low-level DataFrame subclass.
It overrides a lot of methods to auto-change the type back to cls.
Module Contentsο
- class typeddfs.abs_dfs.AbsDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
An abstract Pandas DataFrame subclass with additional methods.
- classmethod _check(df) Noneο
Should raise an
typeddfs.df_errors.InvalidDfErroror subclass for issues.
- classmethod can_read() Set[typeddfs.file_formats.FileFormat]ο
Returns all formats that can be read using
read_file. Some depend on the availability of optional packages. The lines format (.txt,.lines, etc.) is only included if this DataFrame can support only 1 column+index. Seetypeddfs.file_formats.FileFormat.can_read().
- classmethod can_write() Set[typeddfs.file_formats.FileFormat]ο
Returns all formats that can be written to using
write_file. Some depend on the availability of optional packages. The lines format (.txt,.lines, etc.) is only included if this DataFrame type can support only 1 column+index. Seetypeddfs.file_formats.FileFormat.can_write().
- classmethod from_records(*args, **kwargs) __qualname__ο
Convert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.
- Parameters
data (structured ndarray, sequence of tuples or dicts, or DataFrame) β Structured input data.
index (str, list of fields, array-like) β Field of array to use as the index, alternately a specific set of input labels to use.
exclude (sequence, default None) β Columns or fields to exclude.
columns (sequence, default None) β Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).
coerce_float (bool, default False) β Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
nrows (int, default None) β Number of rows to read if data is an iterator.
- Return type
DataFrame
See also
DataFrame.from_dictDataFrame from dict of array-like or dicts.
DataFrameDataFrame object creation using constructor.
Examples
Data can be provided as a structured ndarray:
>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')], ... dtype=[('col_1', 'i4'), ('col_2', 'U1')]) >>> pd.DataFrame.from_records(data) col_1 col_2 0 3 a 1 2 b 2 1 c 3 0 d
Data can be provided as a list of dicts:
>>> data = [{'col_1': 3, 'col_2': 'a'}, ... {'col_1': 2, 'col_2': 'b'}, ... {'col_1': 1, 'col_2': 'c'}, ... {'col_1': 0, 'col_2': 'd'}] >>> pd.DataFrame.from_records(data) col_1 col_2 0 3 a 1 2 b 2 1 c 3 0 d
Data can be provided as a list of tuples with corresponding columns:
>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')] >>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2']) col_1 col_2 0 3 a 1 2 b 2 1 c 3 0 d
- classmethod read_file(path: Union[pathlib.Path, str], *, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, hex_hash: Optional[str] = None, attrs: Optional[bool] = None, storage_options: Optional[pandas._typing.StorageOptions] = None) __qualname__ο
Reads from a file (or possibly URL), guessing the format from the filename extension. Delegates to the
read_*functions of this class.You can always write and then read back to get the same dataframe. .. code-block:
# df is any DataFrame from typeddfs # path can use any suffix df.write_file(path)) df.read_file(path)
Text files always allow encoding with .gz, .zip, .bz2, or .xz.
- Supports:
.csv, .tsv, or .tab
.json
.xml
.feather
.parquet or .snappy
.h5 or .hdf
.xlsx, .xls, .odf, etc.
.toml
.properties
.ini
.fxf (fixed-width)
.flexwf (fixed-but-unspecified-width with an optional delimiter)
.txt, .lines, or .list
See also
- Parameters
path β Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).
file_hash β Check against a hash file specific to this file (e.g. <path>.sha1)
dir_hash β Check against a per-directory hash file
hex_hash β Check against this hex-encoded hash
attrs β Set dataset attributes/metadata (
pd.DataFrame.attrs) from a JSON file. If True, usestypeddfs.df_typing.DfTyping.attrs_suffix. If a str or Path, uses that file. If None or False, does not set.storage_options β Passed to Pandas
- Returns
An instance of this class
- classmethod read_url(url: str) __qualname__ο
Reads from a URL, guessing the format from the filename extension. Delegates to the
read_*functions of this class.See also
- Returns
An instance of this class
- write_file(path: Union[pathlib.Path, str], *, overwrite: bool = True, mkdirs: bool = False, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, attrs: Optional[bool] = None, storage_options: Optional[pandas._typing.StorageOptions] = None, atomic: bool = False) Optional[str]ο
Writes to a file, guessing the format from the filename extension. Delegates to the
to_*functions of this class (e.g.to_csv). Only includes file formats that can be read back in with correspondingtomethods.- Supports, where text formats permit optional .gz, .zip, .bz2, or .xz:
.csv, .tsv, or .tab
.json
.feather
.fwf (fixed-width)
.flexwf (columns aligned but using a delimiter)
.parquet or .snappy
.h5, .hdf, or .hdf5
.xlsx, .xls, and other variants for Excel
.odt and .ods (OpenOffice)
.xml
.toml
.ini
.properties
.pkl and .pickle
.txt, .lines, or .list; see
to_lines()andread_lines()
See also
- Parameters
path β Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).
overwrite β If False, complain if the file already exists
mkdirs β Make the directory and parents if they do not exist
file_hash β Write a hash for this file. The filename will be path+β.β+algorithm. If None, chooses according to
self.get_typing().io.hash_file.dir_hash β Append a hash for this file into a list. The filename will be the directory name suffixed by the algorithm; (i.e. path.parent/(path.parent.name+β.β+algorithm) ). If None, chooses according to
self.get_typing().io.hash_dir.attrs β Write dataset attributes/metadata (
pd.DataFrame.attrs) to a JSON file. usestypeddfs.df_typing.DfTyping.attrs_suffix. If None, chooses according toself.get_typing().io.use_attrs.storage_options β Passed to Pandas
atomic β Write to a temporary file, then renames
- Returns
Whatever the corresponding method on
pd.to_*returns. This is usually either str or None- Raises
InvalidDfError β If the DataFrame is not valid for this type
ValueError β If the type of a column or index name is non-str