📄 notes
字号:
Contains some notes about the Harvest Broker code intended for developers.----------------------------------------------------------------------Registry file format (stor_reg.c):It's one file with a record-based format. A record looks like: 4 bytes in network-byte order for record size 4 bytes in network-byte order for magic number 4 bytes in network-byte order for record flag 4 bytes in network-byte order for URL length n bytes of the URL 4 bytes in network-byte order for Gatherer Name length n bytes of the Gatherer Name [... and so on for other ASCII fields (empty fields have length 0)...] 4 bytes in network-byte order for number 1 4 bytes in network-byte order for number 2 [... and so on for other numeric fields ...] [...end of record of record-size bytes...]The record header for each record includes: 4 bytes in network-byte order for a record size 4 bytes in network-byte order for a magic number 4 bytes in network-byte order for a flag The flag (an unsigned int) would mark a deleted or valid record, and other stuff in the future.With this format, the broker issues 2 read() calls per record: thefirst to get the record size, the second to read() the n bytes of therecord. The broker code would then check the magic number, dosomething with the flag, then (if needed) parse out the record. Thishelps to cut the system calls down.We might also want a header to the registry file that includes: 4 bytes in network-byte order for a magic number 4 bytes in network-byte order for a version numberand maybe some other things like: 4 bytes in network-byte order for the number of records 4 bytes in network-byte order for the number of deleted records 4 bytes in network-byte order for the number of valid recordsThe version number would let us for sure know how many ASCII fields andnumeric fields there are for each record. The stats on the records would helpto determine when to garbage collect the registry file, but they would need tobe continually updated.So, the whole file looks like: [registry header of 20 bytes] [record header of 12 bytes] --------| [record data of n bytes] | [...] | n records [record header of 12 bytes] | [record data of n bytes] --------|The problem with this format is garbage collection. When you delete an entry,you just mark the flag in the record header that the record was deleted, andappend the new one to the end. However, the Broker will compress the Registryevery so often.----------------------------------------------------------------------Below are the valid Query manager flags to the indexers:Common: #desc Show Description Lines #opaque Force no matched linesGlimpse: #index case insenstive Case Insenstive #index error number Allow "number" errors #index matchword Matches on word boundaries #index maxresult number Allow max of "number" resultsWais: #index maxresult number Allow max of "number" results----------------------------------------------------------------------Each SOIF object in the Registry contains the following attributes: URL MANDATORY Gatherer-Name MANDATORY Gatherer-Host MANDATORY Gatherer-Version MANDATORY Update-Time MANDATORY MD5 OPTIONAL Description OPTIONALTwo objects are the same if they both have the same: Gatherer-Name, Gatherer-Host, Gatherer-Version, Update-Timeand either the same URL or the same MD5.----------------------------------------------------------------------Running the Broker:To start the Broker, type: % broker /your/broker.conf [-new | -nocol] The -new flag causes the broker to begin a new collection. The broker will doa collection immediately by default, rather than waiting for the normalcollection time. This is useful for starting the Broker the very first time.If you don't want the broker to do a collection on startup, then use the-nocol flag instead. ----------------------------------------------------------------------Gatherer Bookkeeping Attributes: Update-Time - The time that the summary object was last updated. REQUIRED field, no default. Last-Modification-Time - The L-M-T of the object itself. Defaults to 0. MD5 - The unique string identifying the object itself. Defaults to NULL. Refresh-Rate - The number of seconds after Update-Time when the summary object is to be re-generated. Defaults to 1 week. Time-to-Live - The number of seconds after Update-Time when the summary object is no longer valid. Defaults to 1 month.----------------------------------------------------------------------The Broker's Query Result set (we're in the middle of redoing it, sorry) is astream of newline separated items with a 3 digit code, space, hypen, and spaceat the beginning of each line. It looks like this: 101 - Message to the User 103 - Error Message to the User 111 - Error Message to the User that ends the Broker results 120 - URL of the Match 122 - Opaque data 124 - nbytes\nnbytes of Description 125 - URL of the SOIF object 126 - URL of the Broker's home page 130 - End of Object markerThis line '200 - ...' is always sent first (for the version) and should alwaysbe *ignored*. This message may be sent a few times during the output to testthe connection, so ignore it.For bulk transfers: 000 - Bulk xfer success 400 - Bulk xfer error--------------------------------------------------------------------Glimpse Performance Issues: Limiting the lifetime of 'glimpse' queries:This is the broker's view of things right now, so far it works very well... 1. The Broker runs 'glimpse', and allows it to run for LIFETIME seconds; it also puts a *hard* time limit of LIFETIME CPU-seconds using setrlimit. 2. after LIFETIME seconds, if 'glimpse' has not exited, then the Broker sends SIGTERM to 'glimpse', sleeps for a few seconds, and sends SIGKILL to 'glimpse'. 3. The Broker sends SIGUSR1 to 'glimpseserver' to verify that it really did a clean up. The SIGTERM to 'glimpse' should send 'glimpseserver' a SIGPIPE which will also cause a cleanup. But the redundancy helps... 4. The Broker uses what ever results 'glimpse' returned as the result set and then sends it to the user. This is nice for very heavily loaded brokers, you can give each user a small time slice worth of result sets.Use <INPUT TYPE="hidden" NAME="lifetime" VALUE="LIFETIME"> in your query.htmlto change the lifetime per query to LIFETIME.The MAX_LIFETIME seconds value is configurable in the Broker's broker.conffile. LIFETIME is always between 10 seconds and MAX_LIFETIME seconds. Bydefault, LIFETIME == MAX_LIFETIME, but LIFETIME can be passed along viaquery.html. See Glimpse-MaxLife in broker.conf.--------------------------------------------------------------------Debugging: Use -Dsection,level (or -Dsection for everything) after broker.conf arg in brokerregistry.c section 70, uses level 1, 5, and 9 REGISTRYcollector.c section 71, uses level 1parser.c section 72, uses level 1registry.c section 73, uses level 1, 5, and 9 HASH TABLESstor_man.c section 74, uses level 1query_man.c section 75, uses level 1event.c section 76, uses level 1main.c section 77, uses level 1select_loop.c section 78, uses level 9--------------------------------------------------------------------WIP: Proposed query result interface specification (3/95): BrokerReturn --> Version Header Body Trailer Version --> INTERFACEVERSION Separator VersionRev VersionRev --> MajorNumber MinorNumber string MajorNumber --> number MinorNumber --> number Header --> InfoField Header Header --> InfoField --> BROKER_URL Separator string InfoField --> BROKER_INDEXER Separator string InfoField --> BROKER_COLLECT Separator string InfoField --> MESSAGE_TO_USER Separator string InfoField --> USER_EXT Separator UserExtType Separator Data UserExtType --> string Body --> BulkTransfer Body --> ObjectList Body --> BulkTransfer --> CompressedBulkTransfer BulkTransfer --> RawBulkTransfer CompressedBulkTransfer --> STARTMARKER "gzip'd RawBulkTransfer" ENDMARKER RawBulkTransfer --> @DELETE { SOIFStream } RawBulkTransfer RawBulkTransfer --> @UPDATE { SOIFStream } RawBulkTransfer RawBulkTransfer --> @REFRESH { SOIFStream } RawBulkTransfer RawBulkTransfer --> SOIFStream --> SingleSOIFObject SOIFStream SOIFStream --> ObjectList --> Object ObjectList ObjectList --> Object --> OptWarning ResourceURL ObjectURL OptExt ObjectEnd OptWarning --> WARNING Separator WarningNumber string OptWarning --> WarningNumber --> number ResourceURL --> RESOURCE Separator string ObjectURL --> OBJECT Separator string OptExt --> DescData OptExt OptExt --> OpaqueData OptExt OptExt --> AttributeData OptExt OptExt --> UserExtData OptExt OptExt --> DescData --> DESCRIPTION Separator Data OpaqueData --> OPAQUE Separator Data AttributeData --> ATTRIBUTE Separator AttrString Separator Data AttrString --> string UserExtData --> USEREXTENSION Separator ExtentionType Separator Data ExtentionType --> string ObjectEnd --> OBJEND Trailer --> ObjectCount Trailer --> Error Trailer --> Stats Trailer --> ObjectCount --> OBJCOUNT Separator number Error --> ERROR Separator ErrorNumber string Stats --> STATS Separator Data ErrorNumber --> number Data --> MagicNumber Nbytes NbytesOfData Nbytes --> number string --> [^\n]*\n number --> htonl(number) MagicNumber --> htonl(0x329fa1d2)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -