Welcome to Simple Markdown splitter documentation!

simplemarkdownsplitter.split module

Main module for splitting Markdown content into smaller chunks.

simplemarkdownsplitter.split.combine_chunks_to_match_max_length(chunks: list[str], max_length: int) list[str][source]

Combine multiple chunks into one, if their combined length fits in the specified maximum length. Chunks already bigger than the maximum length are not modified.

Parameters:
  • chunks (list[str]) – A list of markdown content chunks.

  • max_length (int) – The maximum length of each chunk.

Returns:

A list of combined markdown content chunks, if possible.

Return type:

list[str]

simplemarkdownsplitter.split.force_split_too_long_chunks(chunks: list[str], max_length: int) list[str][source]

Force split too long chunks into smaller chunks, if they exceed the maximum length. Too long chunks are simply truncated to max length, with ‘…’ at the end, so markdown formatting might be broken.

Parameters:
  • chunks (list[str]) – A list of markdown content chunks.

  • max_length (int) – The maximum length of each chunk.

Returns:

A list of markdown content chunks, with forced truncated chunks.

Return type:

list[str]

simplemarkdownsplitter.split.split(contents: str, max_length: int, force: bool = False) list[str][source]

Splits the given Markdown content into smaller chunks based on the specified maximum length. Chunks can be:

  • list entries - every paragraph in a list entry will be the same chunk

  • code blocks

  • paragraphs split with single newline characters

By default list entries are not split if they are exceeding the max_length argument. Code blocks are split by lines, however by default too long lines aren’t split. You can force splitting by setting the force argument to True, however this can break Markdown formatting, as the chunks are simply truncated.

Parameters:
  • contents (str) – The markdown content to be split.

  • max_length (int) – The maximum length of each chunk.

  • force (bool, optional) – If True, forces the splitting of chunks that exceed the maximum length. Can break formatting between chunks. Defaults to False.

Returns:

A list of markdown content chunks, each with a length up to max_length.

If force is set to False (or default), chunks might be longer, if there’s no natural way of breaking them up.

Return type:

list[str]

simplemarkdownsplitter.split.split_code_chunk(chunk: str, max_length: int) list[str][source]

Split single code block chunk into smaller chunks, if it exceeds the maximum length. Code blocks are identified by triple backticks at the start and end of the block. Code syntax should be preserved. Too long code blocks are split by lines. It’s possible, that chunks are still to long, if a single line is longer than the maximum length.

Parameters:
  • chunks (list[str]) – A markdown code block to split.

  • max_length (int) – The maximum length of each chunk.

Returns:

A list of shortened code block chunks.

Return type:

list[str]

simplemarkdownsplitter.split.split_into_chunks(contents: str) list[str][source]
Split Markdown content into chunks, e.g.:
  • list entries

  • code blocks

  • paragraphs split with single newline characters

Parameters:

contents (str) – The markdown content to be split.

Returns:

A list of markdown content chunks.

Return type:

list[str]

simplemarkdownsplitter.split.split_too_long_code_block_chunks(chunks: list[str], max_length: int) list[str][source]

Split code block chunks that exceed the maximum length into smaller chunks. Code blocks are identified by triple backticks at the start and end of the block. Code syntax should be preserved. Too long code blocks are split by lines. It’s possible, that chunks are still to long, if a single line is longer than the maximum length.

If no chunks exceed the maximum length, the original list is returned.

Parameters:
  • chunks (list[str]) – A list of markdown content chunks.

  • max_length (int) – The maximum length of each chunk.

Returns:

A list of markdown content chunks, with shortened code blocks.

Return type:

list[str]

Indices and tables