|
|||||||||
Home >> All >> com >> eireneh >> bible >> book >> [ ser overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: ![]() ![]() ![]() |
DETAIL: FIELD | CONSTR | METHOD |
com.eireneh.bible.book.ser
Class SerBible

java.lang.Objectcom.eireneh.bible.book.AbstractBible
com.eireneh.bible.book.VersewiseBible
com.eireneh.bible.book.ser.SerBible
- All Implemented Interfaces:
- com.eireneh.bible.book.Bible, com.eireneh.bible.book.Book
- public class SerBible
- extends com.eireneh.bible.book.VersewiseBible
A Biblical source that comes from files on the local file system.
This format is designed to be fast. At any cost. So disk space does not matter, which is good because early versions used about 100Mb!
This is a history of some of he design desisions that this class has been through.
Searching
I think that a Bible ought not to store anything other than Bible text. I have experimented with a saerch mechanism that cached searches in a very effective manner, however it took up a lot of disk space, and only worked for one version. It might be good to have it work in a more generic way, and an in-memory cache would also be a good idea. So I am going to move the natty search bit into a caching class.
Text Storage
It would be good to get a handle on the way the OLB and Sword and so on work:- OLB: 2 core files: an index file that starts with text like: "AaronitesbaddonAbagthanarimabasedingtedAbbadaeelielonednegolbet" which is a strange sort of index. Possibly strings with start pos and length. Then data files, and plenty of other indexes.
- Theopholos: Single data file that begins- "aaron aaronites aarons abaddon abagtha abana abarim abase abased abasing abated" This is again in index type affair.
- Sword: All this VerseKey stuff ...
Priorities
What factors affect our design the most?- Search Speed: Proably the biggest reason people will have to use this program initially will be the powerful search engine. This can be very demanding though, and every effort should be taken to make best match searches fast.
- Size: Size is not a huge problem from a disk space point of
view - the average hard disk is now about 10Gb. Looking at the
various installations that I have, the average is a little short of
20Mb each. Generally each version takes up 3-5Mb If we were to be
over double this size and take up 50Mb total, I don't think there
would be a huge problem.
However many people will first come to use this program from a net download - now size is a huge problem. Maybe we should have a very very compact download that on installation indexed itself. - Text Retrieval Speed: I do not see this as being a huge issue. The text generation time from reverse-engineering my concordance was acceptable if slow, so this should not be a big deal, and I guess it is very easily cacheable too.
Strategies
For a single verse we have 2 basic strategies. Have a single block of data that specifies the words, punctuation, and markup, or for each set of data we could have a separate source. Clearly there are also hybrid versions. The pros and cons:- Searches only have to read one file, and the information is more dense in that (less disk reads for wanted data) This also applies to the ability to ignore certain types of mark-up.
- It is easier to add/alter a single source of information - or even to share a source amongst versions. Maybe things like red lettering could benefit from this.
- Text display is slower because the information is spread over several files. But as mentioned above - who cares?
- Markup: Most markup is tied to a particular word, so we would need some way of attaching markup to words.
- Inter-Word Punctuation: We could do for punctuation exactly what we
do for the words. List the options in a dictionary, and then write
out an index. I guess less than 255 different types of inter-word
punctuation (1 byte per inter-word). (as opposed to 18360 different
words 2 bytes per word)
There are 32k words in the Bible - this would make the central data file about 64k in size! - Case: To get down to 18k words you need to make "Foo" the same as "foo" and "FOO", however I guess that even making words case sensative we would be under 65k words. Splitting case would not decrease file sizes (but may make it compress better) however it would introduce a new case file. Since there are only 4 cases (See PassageUtil) that is 0.25 bytes per word. (8k for the whole Bible)
- Intra-Word Punctuation: Examples "-',". Examples of words that use these punctuations: Maher-Shalal-Hash-Baz, Aaron's, 144,000. Watch out for --. The NIV uses it to join sentances together--Something like this. However there is no space between these words. This is closely linked to-
- Word Semantics: We could make the words "job", "jobs", and "job's" the same. Also "run", "runs", "running", "runned" and so on. Even "am", "are", "is". This would dramatically reduce the size of the dictionary, make the text re-generation quite complex and the data generation nigh on impossible. But it would make for some really powerful searches (although possibly nothing that a thesaurus would not help)
Distribution Licence: Project B is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The License is available on the internet here, by writing to Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA, Or locally at the Licence link below. The copyright to this program is held by it's authors. |
- Version:
- D4.I0.T0
Nested Class Summary | |
(package private) static class |
SerBible.CustomArrayEnumeration
This customization just clips of the .ser from the array members |
(package private) static class |
SerBible.CustomFilenameFilter
Check that the directories in the version directory really represent versions. |
(package private) static class |
SerBible.Section
A simple class to hold an offset and length into the passages random access file |
Field Summary | |
private long[] |
letters
Some shortcuts into the list of names to help startsWith |
protected static com.eireneh.util.Logger |
log
The log stream |
private java.lang.String |
name
The name of this version |
private static java.lang.String |
PARSER
The SAX parser to use |
private java.io.RandomAccessFile |
ref_dat
The passages random access file |
private java.util.SortedMap |
ref_map
The hash of indexes into the passages file |
private java.net.URL |
url
The base url |
private com.eireneh.bible.book.Version |
version
The Version of the Bible that this produces |
private long[] |
xml_arr
The hash of indexes into the text file, one per verse. |
private java.io.RandomAccessFile |
xml_dat
The text random access file |
Fields inherited from class com.eireneh.bible.book.VersewiseBible |
|
Fields inherited from class com.eireneh.bible.book.AbstractBible |
listeners, percent |
Constructor Summary | |
SerBible(java.lang.String name,
java.net.URL url,
boolean create)
Basic constructor for a SerBible |
Method Summary | |
com.eireneh.bible.passage.Passage |
findPassage(java.lang.String word)
For a given word find a list of references to it |
void |
flush()
Flush the data written to disk |
void |
foundPassage(java.lang.String word,
com.eireneh.bible.passage.Passage ref)
Write the references for a Word |
java.net.URL |
getBaseURL()
The directory that holds the RawBible files |
void |
getDocument(com.eireneh.bible.book.BibleEle doc,
com.eireneh.bible.passage.Passage ref)
Create an XML document for the specified Verses |
com.eireneh.bible.book.BibleDriver |
getDriver()
What driver is controlling this Bible? |
org.jdom.Element |
getElement(com.eireneh.bible.passage.Passage ref)
Retrieval: Use JDOM to retrieve some Bible data |
java.lang.String |
getName()
Meta-Information: What name can I use to get this Bible in a call to Bibles.getBible(name); |
java.lang.String[] |
getStartsWith(java.lang.String word)
Retrieval: Return an array of words that are used by this Bible that start with the given string. |
java.lang.String |
getText(com.eireneh.bible.passage.VerseRange range)
Create a String for the specified Verses |
com.eireneh.bible.book.Version |
getVersion()
Meta-Information: What version of the Bible is this? |
java.util.Enumeration |
listWords()
Retrieval: Get a list of the words used by this Version. |
void |
setDocument(com.eireneh.bible.book.BibleEle doc)
Write the XML to disk |
void |
setVersion(com.eireneh.bible.book.Version version)
Setup the Version information |
Methods inherited from class com.eireneh.bible.book.VersewiseBible |
generate, generatePassages, generateText, getProperties |
Methods inherited from class com.eireneh.bible.book.AbstractBible |
addProgressListener, fireProgressMade, getPropertiesURL, removeProgressListener |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
PARSER
private static final java.lang.String PARSER
- The SAX parser to use
- See Also:
- Constant Field Values
url
private java.net.URL url
- The base url
name
private java.lang.String name
- The name of this version
ref_dat
private java.io.RandomAccessFile ref_dat
- The passages random access file
ref_map
private java.util.SortedMap ref_map
- The hash of indexes into the passages file
xml_dat
private java.io.RandomAccessFile xml_dat
- The text random access file
xml_arr
private long[] xml_arr
- The hash of indexes into the text file, one per verse. Note that the
index in use is NOT the ordinal number of the verse since ordinal nos are
1 based. The index into xml_arr is verse.getOrdinal() - 1
letters
private long[] letters
- Some shortcuts into the list of names to help startsWith
version
private com.eireneh.bible.book.Version version
- The Version of the Bible that this produces
log
protected static com.eireneh.util.Logger log
- The log stream
Constructor Detail |
SerBible
public SerBible(java.lang.String name, java.net.URL url, boolean create) throws com.eireneh.bible.book.BookException
- Basic constructor for a SerBible
Method Detail |
getDriver
public com.eireneh.bible.book.BibleDriver getDriver()
- What driver is controlling this Bible?
getName
public java.lang.String getName()
- Meta-Information: What name can I use to get this Bible in a call
to Bibles.getBible(name);
getVersion
public com.eireneh.bible.book.Version getVersion()
- Meta-Information: What version of the Bible is this?
setVersion
public void setVersion(com.eireneh.bible.book.Version version)
- Setup the Version information
getText
public java.lang.String getText(com.eireneh.bible.passage.VerseRange range) throws com.eireneh.bible.book.BookException
- Create a String for the specified Verses
getElement
public org.jdom.Element getElement(com.eireneh.bible.passage.Passage ref) throws com.eireneh.bible.book.BookException
- Retrieval: Use JDOM to retrieve some Bible data
getDocument
public void getDocument(com.eireneh.bible.book.BibleEle doc, com.eireneh.bible.passage.Passage ref) throws com.eireneh.bible.book.BookException
- Create an XML document for the specified Verses
findPassage
public com.eireneh.bible.passage.Passage findPassage(java.lang.String word) throws com.eireneh.bible.book.BookException
- For a given word find a list of references to it
getStartsWith
public java.lang.String[] getStartsWith(java.lang.String word) throws com.eireneh.bible.book.BookException
- Retrieval: Return an array of words that are used by this Bible
that start with the given string. For example calling:
getStartsWith("love")
will return something like: { "love", "loves", "lover", "lovely", ... }
listWords
public java.util.Enumeration listWords() throws com.eireneh.bible.book.BookException
- Retrieval: Get a list of the words used by this Version. This is
not vital for normal display, however it is very useful for various
things, not least of which is new Version generation. However if
you are only looking to display from this Bible then you
could skip this one.
setDocument
public void setDocument(com.eireneh.bible.book.BibleEle doc) throws com.eireneh.bible.book.BookException
- Write the XML to disk
foundPassage
public void foundPassage(java.lang.String word, com.eireneh.bible.passage.Passage ref) throws com.eireneh.bible.book.BookException
- Write the references for a Word
flush
public void flush() throws com.eireneh.bible.book.BookException
- Flush the data written to disk
getBaseURL
public java.net.URL getBaseURL()
- The directory that holds the RawBible files
|
|||||||||
Home >> All >> com >> eireneh >> bible >> book >> [ ser overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: ![]() ![]() ![]() |
DETAIL: FIELD | CONSTR | METHOD |