typeddfs.abs_dfs

Defines a low-level DataFrame subclass. It overrides a lot of methods to auto-change the type back to cls.

Module Contents

class typeddfs.abs_dfs.AbsDf
classmethod _check(cls, df) None

Should raise an typeddfs.df_errors.InvalidDfError or subclass for issues.

classmethod can_read(cls) Set[typeddfs.file_formats.FileFormat]

Returns all formats that can be read using read_file. Some depend on the availability of optional packages. The lines format (.txt, .lines, etc.) is only included if this DataFrame can support only 1 column+index. See typeddfs.file_formats.FileFormat.can_read().

classmethod can_write(cls) Set[typeddfs.file_formats.FileFormat]

Returns all formats that can be written to using write_file. Some depend on the availability of optional packages. The lines format (.txt, .lines, etc.) is only included if this DataFrame type can support only 1 column+index. See typeddfs.file_formats.FileFormat.can_write().

classmethod from_records(cls, *args, **kwargs) __qualname__
classmethod read_file(cls, path: Union[pathlib.Path, str], *, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, hex_hash: Optional[str] = None, attrs: Optional[bool] = None) __qualname__

Reads from a file (or possibly URL), guessing the format from the filename extension. Delegates to the read_* functions of this class.

You can always write and then read back to get the same dataframe. .. code-block:

# df is any DataFrame from typeddfs
# path can use any suffix
df.write_file(path))
df.read_file(path)

Text files always allow encoding with .gz, .zip, .bz2, or .xz.

Supports:
  • .csv, .tsv, or .tab

  • .json

  • .xml

  • .feather

  • .parquet or .snappy

  • .h5 or .hdf

  • .xlsx, .xls, .odf, etc.

  • .toml

  • .properties

  • .ini

  • .fxf (fixed-width)

  • .flexwf (fixed-but-unspecified-width with an optional delimiter)

  • .txt, .lines, or .list

Parameters
  • path – Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).

  • file_hash – Check against a hash file specific to this file (e.g. <path>.sha1)

  • dir_hash – Check against a per-directory hash file

  • hex_hash – Check against this hex-encoded hash

  • attrs – Set dataset attributes/metadata (pd.DataFrame.attrs) from a JSON file. If True, uses typeddfs.df_typing.DfTyping.attrs_suffix. If a str or Path, uses that file. If None or False, does not set.

Returns

An instance of this class

classmethod read_url(cls, url: str) __qualname__

Reads from a URL, guessing the format from the filename extension. Delegates to the read_* functions of this class.

See also

read_file()

Returns

An instance of this class

write_file(self, path: Union[pathlib.Path, str], *, overwrite: bool = True, mkdirs: bool = False, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, attrs: Optional[bool] = None) Optional[str]

Writes to a file, guessing the format from the filename extension. Delegates to the to_* functions of this class (e.g. to_csv). Only includes file formats that can be read back in with corresponding to methods.

Supports, where text formats permit optional .gz, .zip, .bz2, or .xz:
  • .csv, .tsv, or .tab

  • .json

  • .feather

  • .fwf (fixed-width)

  • .flexwf (columns aligned but using a delimiter)

  • .parquet or .snappy

  • .h5, .hdf, or .hdf5

  • .xlsx, .xls, and other variants for Excel

  • .odt and .ods (OpenOffice)

  • .xml

  • .toml

  • .ini

  • .properties

  • .pkl and .pickle

  • .txt, .lines, or .list; see to_lines() and read_lines()

See also

read_file()

Parameters
  • path – Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).

  • overwrite – If False, complain if the file already exists

  • mkdirs – Make the directory and parents if they do not exist

  • file_hash – Write a hash for this file. The filename will be path+”.”+algorithm. If None, chooses according to self.get_typing().io.hash_file.

  • dir_hash – Append a hash for this file into a list. The filename will be the directory name suffixed by the algorithm; (i.e. path.parent/(path.parent.name+”.”+algorithm) ). If None, chooses according to self.get_typing().io.hash_dir.

  • attrs – Write dataset attributes/metadata (pd.DataFrame.attrs) to a JSON file. uses typeddfs.df_typing.DfTyping.attrs_suffix. If None, chooses according to self.get_typing().io.use_attrs.

Returns

Whatever the corresponding method on pd.to_* returns. This is usually either str or None

Raises
  • InvalidDfError – If the DataFrame is not valid for this type

  • ValueError – If the type of a column or index name is non-str