Class SimpleTextBKDWriter
- java.lang.Object
-
- org.apache.lucene.codecs.simpletext.SimpleTextBKDWriter
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
final class SimpleTextBKDWriter extends java.lang.Object implements java.io.CloseableForked fromBKDWriterand simplified/specialized for SimpleText's usage
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classSimpleTextBKDWriter.OneDimensionBKDWriter
-
Field Summary
Fields Modifier and Type Field Description protected intbytesPerDimHow many bytes each value in each dimension takes.private intbytesPerDocHow many bytes each docs takes in the fixed-width offline formatstatic java.lang.StringCODEC_NAME(package private) int[]commonPrefixLengthsstatic floatDEFAULT_MAX_MB_SORT_IN_HEAPDefault maximum heap to use, before spilling to (slower) diskstatic intDEFAULT_MAX_POINTS_IN_LEAF_NODEDefault maximum number of point in each leaf blockprotected FixedBitSetdocsSeenprivate booleanfinishedstatic intMAX_DIMSMaximum number of dimensions (2 * max index dimensions)static intMAX_INDEX_DIMSMaximum number of dimensionsprivate intmaxDoc(package private) doublemaxMBSortInHeapprotected byte[]maxPackedValueMaximum per-dim values, packedprotected intmaxPointsInLeafNodeprivate intmaxPointsSortInHeapprotected byte[]minPackedValueMinimum per-dim values, packedprotected intnumDataDimsHow many dimensions we are storing at the leaf (data) nodesprotected intnumIndexDimsHow many dimensions we are indexing in the internal nodesprotected intpackedBytesLengthnumDims * bytesPerDimprotected intpackedIndexBytesLengthnumIndexDims * bytesPerDimprotected longpointCountprivate PointWriterpointWriter(package private) BytesRefBuilderscratch(package private) byte[]scratch1(package private) byte[]scratch2(package private) BytesRefscratchBytesRef1(package private) BytesRefscratchBytesRef2(package private) byte[]scratchDiff(package private) TrackingDirectoryWrappertempDir(package private) java.lang.StringtempFileNamePrefixprivate IndexOutputtempInputprivate longtotalPointCountAn upper bound on how many points the caller will add (includes deletions)static intVERSION_COMPRESSED_DOC_IDSstatic intVERSION_COMPRESSED_VALUESstatic intVERSION_CURRENTstatic intVERSION_IMPLICIT_SPLIT_DIM_1Dstatic intVERSION_START
-
Constructor Summary
Constructors Constructor Description SimpleTextBKDWriter(int maxDoc, Directory tempDir, java.lang.String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(byte[] packedValue, int docID)private voidbuild(int nodeID, int leafNodeOffset, MutablePointValues reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, byte[] splitPackedValues, long[] leafBlockFPs, int[] spareDocIds)private voidbuild(int nodeID, int leafNodeOffset, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, byte[] splitPackedValues, long[] leafBlockFPs, int[] spareDocIds)The array (sized numDims) of PathSlice describe the cell we have currently recursed to.private voidcheckMaxLeafNodeCount(int numLeaves)voidclose()private voidcomputeCommonPrefixLength(HeapPointWriter heapPointWriter, byte[] commonPrefix)longfinish(IndexOutput out)Writes the BKD tree to the providedIndexOutputand returns the file offset where index was written.longgetPointCount()How many points have been added so farprivate voidnewline(IndexOutput out)private voidrotateToTree(int nodeID, int offset, int count, byte[] index, java.util.List<byte[]> leafBlockStartValues)private static intrunLen(java.util.function.IntFunction<BytesRef> packedValues, int start, int end, int byteOffset)protected intsplit(byte[] minPackedValue, byte[] maxPackedValue)private HeapPointWriterswitchToHeap(PointWriter source)Pull a partition back into heap once the point count is low enough while recursing.private booleanvalueInBounds(BytesRef packedValue, byte[] minPackedValue, byte[] maxPackedValue)Called only in assertprivate booleanvalueInOrder(long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc)private booleanvaluesInOrderAndBounds(int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, java.util.function.IntFunction<BytesRef> values, int[] docs, int docsOffset)private java.lang.ErrorverifyChecksum(java.lang.Throwable priorException, PointWriter writer)Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.static voidverifyParams(int numDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)private voidwrite(IndexOutput out, java.lang.String s)private voidwrite(IndexOutput out, BytesRef b)longwriteField(IndexOutput out, java.lang.String fieldName, MutablePointValues reader)Write a field from aMutablePointValues.private longwriteField1Dim(IndexOutput out, java.lang.String fieldName, MutablePointValues reader)private longwriteFieldNDims(IndexOutput out, java.lang.String fieldName, MutablePointValues values)private voidwriteIndex(IndexOutput out, long[] leafBlockFPs, byte[] splitPackedValues)Subclass can change how it writes the index.private voidwriteInt(IndexOutput out, int x)protected voidwriteLeafBlockDocs(IndexOutput out, int[] docIDs, int start, int count)protected voidwriteLeafBlockPackedValues(IndexOutput out, int[] commonPrefixLengths, int count, int sortedDim, java.util.function.IntFunction<BytesRef> packedValues)private voidwriteLeafBlockPackedValuesRange(IndexOutput out, int[] commonPrefixLengths, int start, int end, java.util.function.IntFunction<BytesRef> packedValues)private voidwriteLong(IndexOutput out, long x)
-
-
-
Field Detail
-
CODEC_NAME
public static final java.lang.String CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_COMPRESSED_DOC_IDS
public static final int VERSION_COMPRESSED_DOC_IDS
- See Also:
- Constant Field Values
-
VERSION_COMPRESSED_VALUES
public static final int VERSION_COMPRESSED_VALUES
- See Also:
- Constant Field Values
-
VERSION_IMPLICIT_SPLIT_DIM_1D
public static final int VERSION_IMPLICIT_SPLIT_DIM_1D
- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
bytesPerDoc
private final int bytesPerDoc
How many bytes each docs takes in the fixed-width offline format
-
DEFAULT_MAX_POINTS_IN_LEAF_NODE
public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf block- See Also:
- Constant Field Values
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk- See Also:
- Constant Field Values
-
MAX_DIMS
public static final int MAX_DIMS
Maximum number of dimensions (2 * max index dimensions)- See Also:
- Constant Field Values
-
MAX_INDEX_DIMS
public static final int MAX_INDEX_DIMS
Maximum number of dimensions- See Also:
- Constant Field Values
-
numDataDims
protected final int numDataDims
How many dimensions we are storing at the leaf (data) nodes
-
numIndexDims
protected final int numIndexDims
How many dimensions we are indexing in the internal nodes
-
bytesPerDim
protected final int bytesPerDim
How many bytes each value in each dimension takes.
-
packedBytesLength
protected final int packedBytesLength
numDims * bytesPerDim
-
packedIndexBytesLength
protected final int packedIndexBytesLength
numIndexDims * bytesPerDim
-
scratch
final BytesRefBuilder scratch
-
tempDir
final TrackingDirectoryWrapper tempDir
-
tempFileNamePrefix
final java.lang.String tempFileNamePrefix
-
maxMBSortInHeap
final double maxMBSortInHeap
-
scratchDiff
final byte[] scratchDiff
-
scratch1
final byte[] scratch1
-
scratch2
final byte[] scratch2
-
scratchBytesRef1
final BytesRef scratchBytesRef1
-
scratchBytesRef2
final BytesRef scratchBytesRef2
-
commonPrefixLengths
final int[] commonPrefixLengths
-
docsSeen
protected final FixedBitSet docsSeen
-
pointWriter
private PointWriter pointWriter
-
finished
private boolean finished
-
tempInput
private IndexOutput tempInput
-
maxPointsInLeafNode
protected final int maxPointsInLeafNode
-
maxPointsSortInHeap
private final int maxPointsSortInHeap
-
minPackedValue
protected final byte[] minPackedValue
Minimum per-dim values, packed
-
maxPackedValue
protected final byte[] maxPackedValue
Maximum per-dim values, packed
-
pointCount
protected long pointCount
-
totalPointCount
private final long totalPointCount
An upper bound on how many points the caller will add (includes deletions)
-
maxDoc
private final int maxDoc
-
-
Constructor Detail
-
SimpleTextBKDWriter
public SimpleTextBKDWriter(int maxDoc, Directory tempDir, java.lang.String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount) throws java.io.IOException- Throws:
java.io.IOException
-
-
Method Detail
-
verifyParams
public static void verifyParams(int numDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
-
add
public void add(byte[] packedValue, int docID) throws java.io.IOException- Throws:
java.io.IOException
-
getPointCount
public long getPointCount()
How many points have been added so far
-
writeField
public long writeField(IndexOutput out, java.lang.String fieldName, MutablePointValues reader) throws java.io.IOException
Write a field from aMutablePointValues. This way of writing points is faster than regular writes withBKDWriter.add(byte[], int)since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
java.io.IOException
-
writeFieldNDims
private long writeFieldNDims(IndexOutput out, java.lang.String fieldName, MutablePointValues values) throws java.io.IOException
- Throws:
java.io.IOException
-
writeField1Dim
private long writeField1Dim(IndexOutput out, java.lang.String fieldName, MutablePointValues reader) throws java.io.IOException
- Throws:
java.io.IOException
-
rotateToTree
private void rotateToTree(int nodeID, int offset, int count, byte[] index, java.util.List<byte[]> leafBlockStartValues)
-
checkMaxLeafNodeCount
private void checkMaxLeafNodeCount(int numLeaves)
-
finish
public long finish(IndexOutput out) throws java.io.IOException
Writes the BKD tree to the providedIndexOutputand returns the file offset where index was written.- Throws:
java.io.IOException
-
writeIndex
private void writeIndex(IndexOutput out, long[] leafBlockFPs, byte[] splitPackedValues) throws java.io.IOException
Subclass can change how it writes the index.- Throws:
java.io.IOException
-
writeLeafBlockDocs
protected void writeLeafBlockDocs(IndexOutput out, int[] docIDs, int start, int count) throws java.io.IOException
- Throws:
java.io.IOException
-
writeLeafBlockPackedValues
protected void writeLeafBlockPackedValues(IndexOutput out, int[] commonPrefixLengths, int count, int sortedDim, java.util.function.IntFunction<BytesRef> packedValues) throws java.io.IOException
- Throws:
java.io.IOException
-
writeLeafBlockPackedValuesRange
private void writeLeafBlockPackedValuesRange(IndexOutput out, int[] commonPrefixLengths, int start, int end, java.util.function.IntFunction<BytesRef> packedValues) throws java.io.IOException
- Throws:
java.io.IOException
-
runLen
private static int runLen(java.util.function.IntFunction<BytesRef> packedValues, int start, int end, int byteOffset)
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
verifyChecksum
private java.lang.Error verifyChecksum(java.lang.Throwable priorException, PointWriter writer) throws java.io.IOExceptionCalled on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.- Throws:
java.io.IOException
-
valueInBounds
private boolean valueInBounds(BytesRef packedValue, byte[] minPackedValue, byte[] maxPackedValue)
Called only in assert
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue)
-
switchToHeap
private HeapPointWriter switchToHeap(PointWriter source) throws java.io.IOException
Pull a partition back into heap once the point count is low enough while recursing.- Throws:
java.io.IOException
-
build
private void build(int nodeID, int leafNodeOffset, MutablePointValues reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, byte[] splitPackedValues, long[] leafBlockFPs, int[] spareDocIds) throws java.io.IOException- Throws:
java.io.IOException
-
build
private void build(int nodeID, int leafNodeOffset, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, byte[] splitPackedValues, long[] leafBlockFPs, int[] spareDocIds) throws java.io.IOExceptionThe array (sized numDims) of PathSlice describe the cell we have currently recursed to.- Throws:
java.io.IOException
-
computeCommonPrefixLength
private void computeCommonPrefixLength(HeapPointWriter heapPointWriter, byte[] commonPrefix)
-
valuesInOrderAndBounds
private boolean valuesInOrderAndBounds(int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, java.util.function.IntFunction<BytesRef> values, int[] docs, int docsOffset) throws java.io.IOException- Throws:
java.io.IOException
-
valueInOrder
private boolean valueInOrder(long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc)
-
write
private void write(IndexOutput out, java.lang.String s) throws java.io.IOException
- Throws:
java.io.IOException
-
writeInt
private void writeInt(IndexOutput out, int x) throws java.io.IOException
- Throws:
java.io.IOException
-
writeLong
private void writeLong(IndexOutput out, long x) throws java.io.IOException
- Throws:
java.io.IOException
-
write
private void write(IndexOutput out, BytesRef b) throws java.io.IOException
- Throws:
java.io.IOException
-
newline
private void newline(IndexOutput out) throws java.io.IOException
- Throws:
java.io.IOException
-
-