typeddfs.abs_dfs

Defines a low-level DataFrame subclass. It overrides a lot of methods to auto-change the type back to cls.

Module Contents

class typeddfs.abs_dfs.AbsDf(data=None, index=None, columns=None, dtype=None, copy=False)

An abstract Pandas DataFrame subclass with additional methods.

classmethod _check(df) None

Should raise an typeddfs.df_errors.InvalidDfError or subclass for issues.

classmethod can_read() Set[typeddfs.file_formats.FileFormat]

Returns all formats that can be read using read_file. Some depend on the availability of optional packages. The lines format (.txt, .lines, etc.) is only included if this DataFrame can support only 1 column+index. See typeddfs.file_formats.FileFormat.can_read().

classmethod can_write() Set[typeddfs.file_formats.FileFormat]

Returns all formats that can be written to using write_file. Some depend on the availability of optional packages. The lines format (.txt, .lines, etc.) is only included if this DataFrame type can support only 1 column+index. See typeddfs.file_formats.FileFormat.can_write().

classmethod from_records(*args, **kwargs) __qualname__

Convert structured or record ndarray to DataFrame.

Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.

Parameters
  • data (structured ndarray, sequence of tuples or dicts, or DataFrame) – Structured input data.

  • index (str, list of fields, array-like) – Field of array to use as the index, alternately a specific set of input labels to use.

  • exclude (sequence, default None) – Columns or fields to exclude.

  • columns (sequence, default None) – Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).

  • coerce_float (bool, default False) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

  • nrows (int, default None) – Number of rows to read if data is an iterator.

Return type

DataFrame

See also

DataFrame.from_dict

DataFrame from dict of array-like or dicts.

DataFrame

DataFrame object creation using constructor.

Examples

Data can be provided as a structured ndarray:

>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
...                 dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of dicts:

>>> data = [{'col_1': 3, 'col_2': 'a'},
...         {'col_1': 2, 'col_2': 'b'},
...         {'col_1': 1, 'col_2': 'c'},
...         {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of tuples with corresponding columns:

>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2'])
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d
classmethod read_file(path: Union[pathlib.Path, str], *, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, hex_hash: Optional[str] = None, attrs: Optional[bool] = None, storage_options: Optional[pandas._typing.StorageOptions] = None) __qualname__

Reads from a file (or possibly URL), guessing the format from the filename extension. Delegates to the read_* functions of this class.

You can always write and then read back to get the same dataframe. .. code-block:

# df is any DataFrame from typeddfs
# path can use any suffix
df.write_file(path))
df.read_file(path)

Text files always allow encoding with .gz, .zip, .bz2, or .xz.

Supports:
  • .csv, .tsv, or .tab

  • .json

  • .xml

  • .feather

  • .parquet or .snappy

  • .h5 or .hdf

  • .xlsx, .xls, .odf, etc.

  • .toml

  • .properties

  • .ini

  • .fxf (fixed-width)

  • .flexwf (fixed-but-unspecified-width with an optional delimiter)

  • .txt, .lines, or .list

Parameters
  • path – Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).

  • file_hash – Check against a hash file specific to this file (e.g. <path>.sha1)

  • dir_hash – Check against a per-directory hash file

  • hex_hash – Check against this hex-encoded hash

  • attrs – Set dataset attributes/metadata (pd.DataFrame.attrs) from a JSON file. If True, uses typeddfs.df_typing.DfTyping.attrs_suffix. If a str or Path, uses that file. If None or False, does not set.

  • storage_options – Passed to Pandas

Returns

An instance of this class

classmethod read_url(url: str) __qualname__

Reads from a URL, guessing the format from the filename extension. Delegates to the read_* functions of this class.

See also

read_file()

Returns

An instance of this class

write_file(path: Union[pathlib.Path, str], *, overwrite: bool = True, mkdirs: bool = False, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, attrs: Optional[bool] = None, storage_options: Optional[pandas._typing.StorageOptions] = None, atomic: bool = False) Optional[str]

Writes to a file, guessing the format from the filename extension. Delegates to the to_* functions of this class (e.g. to_csv). Only includes file formats that can be read back in with corresponding to methods.

Supports, where text formats permit optional .gz, .zip, .bz2, or .xz:
  • .csv, .tsv, or .tab

  • .json

  • .feather

  • .fwf (fixed-width)

  • .flexwf (columns aligned but using a delimiter)

  • .parquet or .snappy

  • .h5, .hdf, or .hdf5

  • .xlsx, .xls, and other variants for Excel

  • .odt and .ods (OpenOffice)

  • .xml

  • .toml

  • .ini

  • .properties

  • .pkl and .pickle

  • .txt, .lines, or .list; see to_lines() and read_lines()

See also

read_file()

Parameters
  • path – Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).

  • overwrite – If False, complain if the file already exists

  • mkdirs – Make the directory and parents if they do not exist

  • file_hash – Write a hash for this file. The filename will be path+”.”+algorithm. If None, chooses according to self.get_typing().io.hash_file.

  • dir_hash – Append a hash for this file into a list. The filename will be the directory name suffixed by the algorithm; (i.e. path.parent/(path.parent.name+”.”+algorithm) ). If None, chooses according to self.get_typing().io.hash_dir.

  • attrs – Write dataset attributes/metadata (pd.DataFrame.attrs) to a JSON file. uses typeddfs.df_typing.DfTyping.attrs_suffix. If None, chooses according to self.get_typing().io.use_attrs.

  • storage_options – Passed to Pandas

  • atomic – Write to a temporary file, then renames

Returns

Whatever the corresponding method on pd.to_* returns. This is usually either str or None

Raises
  • InvalidDfError – If the DataFrame is not valid for this type

  • ValueError – If the type of a column or index name is non-str