Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagecpp
type StringField inteface { 
extract(segmentOffsets []int32) []string
serialize() []bytes
deserialize([]bytes)
}
func Filter(expression string, field StringField) sgementOffsets []int32


We can limit the number of entities in Segment within a certain range, so the type of Segment offset is an int32.

The extract interface on Stringfield can retrieve the corresponding String according to the provided segment offsets.

The function Filter calculates the segment offsets on the Stringfield based on the expression string.

The serialize method serializes itself into a slice of bytes, which is convenient to store in ObjectStroage as an index file.

The deserialize method deserializes the index file into a Stringfield object.


An implementation of Stringfield of Historical Segment

The following gives a C++ definition of HistoricalStringField.


Code Block
languagecpp
class HistoricalStringField1 {
std::vector<string> strs;
std::unordered_map<int32, std::vector<int32>> strOffsetToSegOffsets;
std::vector<int32> segOffsetToStrOffset;

std::vector<string> 
extract(const std::vector<int32>& segmentOffsets);
std::vector<Blob>
serialize();
void
deserialize(const std::vector<Blob>&)
}
class Blob {
std::vector<char> blob_;
}



The strs member contains all the strings after deduplication and is sorted in ascending order.

The strOffsetToSegOffsets represents the mapping from String offset to segment offset. A string can appear in multiple entities, so the value type here is a vector.

SegOffsetToStrOffset represents the mapping from segment offset to String offset. Using string offset, we can retrieve the original string from strs.

Thus, opeations ("==", "!=", "<", "<=", ">", ">=") are transformed into binary search on strs to get the corresponding string offset, and then converted to segment offsets according to strOffsetToSegOffsets.

For the extract interface, you only need to retrieve the corresponding String according to segment offsets and segOffsetToStrOffset.

When there is no index file in the object store, QueryNode loads the original data from the object store and creates a Stringfield object based on the original data.

When the index file exists, QueryNode loads the index file from the object store, call the deserialize method of Stringfield, and generates a Stringfield object.